Director, Data & Analytics Engineering

Salary

$250,000 - $325,000

Location

New York, NY

Posted

Today

As the Director, Data & Analytics Engineering, you will lead a team of Data and Analytics Engineers building the data foundation of the Charlie Health Operating Model, an AI-first model where the warehouse is no longer a passive reporting layer, but the substrate of our World Model: a continuously updated, causal representation of every patient, provider, payor, and operational signal in our ecosystem.

You will own the pipelines, schemas, and storage systems that fuel both human and machine consumers, powering our 1,600+ dbt models, our real-time Intelligence Layer, and the agentic systems composing interventions on top of it. You will push us beyond Snowflake batch cycles into event-driven signal graphs that detect drift the moment a cohort's trajectory diverges from baseline, and you will architect the semantic retrieval surfaces that make our long-form clinical and operational artifacts AI-ready.

This is a hands-on, Player-Coach leadership role. You will set the technical vision, run sprint ceremonies and roadmap planning, mentor ICs and a Data Engineering Manager, support DRIs running cross-cutting 90-day missions, and stay close enough to the code to make sharp architectural calls and unblock the team. You will partner with leaders across ML, Data Science, BizOps, Product, and Engineering and own the shape of the data model architecture that the next decade of Charlie Health is built on.

We're a team of passionate, forward-thinking professionals eager to take on the behavioral health crisis and play a formative role in providing life-saving solutions. If you're inspired by our mission and energized by the idea that the distance between identifying an operational friction and shipping a clinical intervention should be measured in days rather than months, apply today.

Responsibilities

Own the warehouse and pipeline architecture that powers the Charlie Health World Model, the causal, continuously updated representation of our ecosystem that fuels reporting, product analytics, and the agentic Intelligence Layer
Drive the long-term roadmap for AI-ready data: real-time signal graphs beyond batch cycles, schema-validated event flows, and semantic retrieval surfaces for transcripts, notes, and other long-form artifacts
Define and execute a vision for scalable data model architecture in Snowflake, evolving 1,600+ existing dbt models toward extensible, well-governed, agent-consumable patterns
Partner with Machine Learning, Data Science, Engineering, Product, and BizOps to align data initiatives with company priorities and unlock new opportunities, particularly the agentic capabilities riding on top of the World Model
Lead execution end-to-end: sprint ceremonies, roadmap planning, and prioritization, with the discipline to keep a growing team shipping high-quality, reliable solutions on schedule
Oversee data integrity, documentation, testing, monitoring, and provenance across systems so that stakeholders, human and agent, can trust and self-serve on our data
Guide and contribute to architecture and design decisions across our stack (Dagster, Snowflake, dbt, Fivetran, Hightouch, Hex, Tableau), and resolve critical technical issues as a hands-on technical leader
Own the data infrastructure that powers company-wide KPI reporting, ad hoc analysis, product analytics, and dashboard development, ensuring the underlying models, pipelines, and semantic layer enable downstream teams to deliver clear, accurate insights as business questions evolve
Drive reliability, scalability, observability, security, and cost-efficiency improvements across the data stack, including the PHI gatekeeping and HITRUST-aligned patterns that govern signal ingestion
Identify bottlenecks and implement improvements to team workflows, tools, and development practices
Manage and mentor a growing team of Data and Analytics Engineers and one Data Engineering Manager. Operate as a Player-Coach: focus on craft, mentorship, high-leverage code review, and supporting DRIs running 90-day cross-cutting missions
Establish metrics that track progress, communicate priorities, and demonstrate business impact
Define and maintain data governance standards and proactively manage stakeholder expectations to drive scalable, trusted data use

Requirements

10+ years of data or analytics engineering experience, with 4+ years managing and mentoring data and analytics engineers, with a focus on execution and delivery
Proven ability to drive team operations, including sprint ceremonies, roadmap planning, and prioritization across multiple workstreams
Hands-on background building and maintaining ELT pipelines using SQL, dbt, and OLAP databases like Snowflake
Proficiency in Python (preferred) or another language strong enough to guide and review technical work
Experience with workflow orchestration tools like Dagster, Airflow, or Prefect
Skilled at building scalable reporting solutions in Tableau or Hex and enabling self-serve analytics across the organization
Strong data modeling, provenance, and governance skills to support extensible, trusted, and consistent reporting and patterns
Track record of improving team processes, optimizing workflows, and delivering measurable impact
Effective cross-functional partner with Machine Learning, Data Science, Product and Engineering, aligning data work with business goals
Clear, concise communicator who can translate complex technical concepts for non-technical stakeholders
Comfortable navigating ambiguity, breaking down complex problems, and driving iterative solutions
Committed people leader with a history of coaching talent and fostering a high-performance, inclusive team culture

This role requires 4 days per week in our NYC office (Flatiron District)

Nice to haves

Familiarity with healthcare data standards (HIPAA, FHIR, HITRUST)
Experience with AWS cloud technologies
Experience working in a startup environment
Exposure to event-driven architectures (EventBridge, Kafka, CloudEvents) and JSON Schema registries
Familiarity with vector stores (Pgvector, Pinecone) and patterns for semantic retrieval over operational data
Experience supporting ML or agentic AI workloads as a data consumer — designing schemas, features, or context surfaces that downstream models and agents rely on
Awareness of LLMOps tooling and what it takes to keep AI systems observable and PHI-safe

‌