AI Data Engineer
Build the data infrastructure powering FactualIQ’s enterprise Decision Engine. Develop scalable pipelines, implement high-performance data systems, and enable intelligent workflows that support mission-critical AI decision-making for enterprise clients.
About the Role
As an AI Data Engineer, you’ll build and maintain the data infrastructure that powers FactualIQ’s Decision Engine at enterprise scale. You’ll implement data systems, build pipelines, and develop the data processing infrastructure that enables intelligent workflows for Fortune 1000 clients. This is a hands-on technical role where you’ll work alongside senior engineers to build reliable, performant data systems that support client-critical AI decision-making while growing your expertise in advanced data engineering for AI applications.
What You’ll Do
- Implement and maintain data pipelines that support hundreds of concurrent agent workflows with reliable performance for real-time decision support.
- Build and optimize database systems including relational databases (PostgreSQL, MySQL), vector databases (Pinecone, Weaviate, Qdrant), implementing indexing strategies and query optimization patterns.
- Develop semantic search capabilities including embedding pipelines, hybrid search implementations combining vector and traditional search, and result ranking systems.
- Build data processing workflows for agent context management including ETL pipelines, batch processing systems, and incremental data update patterns.
- Implement data models for AI workflows including temporal patterns, multi-tenant data structures, and schema versioning approaches.
- Develop data quality and monitoring systems including validation frameworks, data drift detection, pipeline health checks, and automated alerting.
- Build caching systems for semantic data including multi-level cache implementations, cache invalidation logic, and performance optimization.
- Implement feature engineering pipelines including feature computation, versioning systems, and serving infrastructure for low-latency access.
- Develop data governance components including data classification, PII detection and handling, and audit logging for compliance requirements.
- Optimize data system performance including query tuning, index design, and cost-effective data storage strategies.
- Participate in sprint planning, contribute to technical discussions, and maintain clear documentation for data systems and processes.
- Stay current with emerging best practices in data engineering and AI data systems, incorporating learnings into your work.
What You’ll Bring
- Bachelor’s degree in Computer Science, Data Engineering, Statistics, or related technical field (or equivalent practical experience).
- 4+ years of production data engineering experience with hands-on development of data pipelines, database systems, and data processing workflows.
- Strong Python proficiency with solid understanding of data processing libraries (Pandas, Polars, DuckDB) and performance considerations for data workloads.
- Production experience with relational databases (PostgreSQL, MySQL) including query optimization, index design, and understanding of scaling approaches.
- Experience with vector databases and semantic search (Pinecone, Weaviate, Qdrant, ChromaDB) including implementation of search systems and understanding of embedding-based retrieval.
- Experience with data pipeline orchestration tools (Airflow, Prefect, Dagster) including workflow design and error handling patterns.
- Strong SQL skills including complex queries, window functions, CTEs, and ability to optimize query performance.
- Understanding of data modeling approaches including normalized schemas, dimensional modeling, and trade-offs for different use cases.
- Experience with cloud data platforms (AWS RDS/Aurora, GCP Cloud SQL, Azure Database) and infrastructure provisioning.
- Familiarity with embedding models and semantic processing including model selection, chunking approaches, and quality evaluation.
- Experience building RAG (Retrieval Augmented Generation) systems including chunking strategies, context optimization, and retrieval patterns.
- Understanding of data quality practices including monitoring, validation, and incident response.
- Experience working in multi-tenant environments with awareness of data isolation and tenant-level optimization needs.
- Strong communication skills and ability to collaborate effectively with other engineers and stakeholders.
What You Might Bring
- Experience with knowledge graph concepts and graph databases (Neo4j, Neptune) for representing complex relationships.
- Familiarity with streaming data systems (Kafka, Kinesis) and real-time data processing patterns.
- Background in machine learning concepts including feature engineering, data augmentation, and model training data preparation.
- Experience with time-series databases (InfluxDB, TimescaleDB) or analytical databases (ClickHouse, DuckDB).
- Understanding of data mesh principles including data product thinking and domain-oriented data design.
- Knowledge of database internals including storage engines, index structures, and query execution.
- Experience mentoring junior engineers or contributing to team knowledge sharing.
- Contributions to open-source projects, technical blog posts, or community involvement in data engineering topics.
- Relevant certifications (AWS Database Specialty, GCP Data Engineer, or similar).
What WE VALUE
- A growth mindset, building on the recognition that a good engineer is always learning.
- Creative, entrepreneurial flexibility to try innovative approaches to solving problems, coupled with the resilience to recognize mistakes quickly, adapt and correct course as needed to achieve success.
- Speed to solutions, with rapid, well-planned iterations.
- Design-forward approaches to building technology products, coupled with a test-heavy technique to ensure that both the problem to be solved and the solution context are clear and optimal before development begins.
- Transparent, frequent and constructive communication skills and practices.
- Low-ego collaboration, where feedback is valued, everyone’s voice is heard, debates and disagreements are used for the team’s benefit, and commitment matters.
- Mission alignment and care for delivering highest-standard quality to support our clients’ success.
Reporting
The role currently reports to FactualIQ’s President. As we build out our engineering team, the role will ultimately report to a Tech Lead or Senior Tech Lead.
Career progression from this role will lead to a Senior AI Data Engineer position.