Fivetran and DBT in Agentic future

Oct 07, 2025

I have written bits here and there about both Fivetran and dbt previously, so before putting this piece together, I pulled my prior opinions from GPT. And it gave me these:

But all that was written prior to the agentic revolution. Agents don’t behave like data analysts. They speculate: issue many overlapping queries, branch, retry, and steer based on partial results and feedback.

There is argument to be made that we need agent-first data systems to cope with scale, heterogeneity, redundancy, and steerability - characteristics distinct from classic BI/ETL. While the Fivetran-dbt era was optimized for dashboard creation and insights, the next era appears to be optimized for what comes after the dashboards: OUTCOMES.

The Fivetran-dbt era was optimized for dashboard creation and insights, but the next era appears to be optimized for what comes after the dashboards: outcomes.

So with all that, a short revisit is necessary.

ETL Consolidation

To recap, Fivetran has previously acquired Tobiko Data (the team behind SQLMesh and SQLGlot). Separately, dbt Labs acquired SDF Labs earlier this year. The widely reported Fivetran ↔ dbt Labs deal is in advanced talks but not publicly closed. If it does close, SDF would come along for the ride under the dbt umbrella, meaning one company would control the leading connector platform (Fivetran), the most adopted transformation framework (dbt), a next-gen typed SQL toolchain (SDF), and a stateful transformation engine (SQLMesh).

Potential implications for the ecosystem are:

Consolidation of the “T” (and more): Connect (Fivetran) + Transform (dbt/SDF/SQLMesh) positions Fivetran as a de facto end-to-end data pipeline vendor. Great for procurement; dangerous for optionality if primitives don’t stay open. While Tobiko says SQLMesh/SQLGlot will remain open source, it remains to be seen how likely that is.
Overlapping roadmaps: dbt’s compiler + SDF’s SQL comprehension + SQLMesh’s state management are partially redundant. Expect convergence pressure: one planner, one type system1, one lineage substrate2. The migration path (macro syntax, state stores3, test/eval semantics) will make or break trust.
Community stewardship risk: dbt/SDF/SQLMesh each have opinionated communities. A single owner raises the bar for transparent governance, long-term OSS commitments, and fair plugin interfaces. This is where the current approach (open APIs, closed products) can backfire.
Competitive posture vs. cloud platforms: A combined stack is a clearer counterweight to Snowflake/Databricks “one-stop” narratives. Expect heavier bundling, deeper optimizer integrations, and “AI-ready” marketing.

Agentic Future - Out Data Infra, In Outcomes…

The big question though is how much of the above legacy stack is relevant to the agentic future. In a world in which AI community cares much more about obtaining outcomes without the specialized skills of infra, orchestration, and data pipelines, there is demand for a new kind of stack.

The idea is that you should be able to state intent in business terms and the system builds a typed semantic plan4 (entities, metrics, governance policy) and proposes data paths across sources. It then auto-generates tests, sample sets, and a minimal “proof” run before scaling, and continuous domain capture (not one-off modeling).

With every prompt, domain knowledge is accumulated as first-class assets (concepts, constraints, SLAs, policies). And any changes to definitions are versioned along with their explanations.

In this new stack, data pipelines are self-healing. When schemas drift or upstream quality dips, the system proposes edits (with diffs, impact analysis), runs shadow plans, and performs rollback (or forward deployments) with approvals.

However, existing Data-AI tools appear to fall far short of this vision. Most remain as features rather than complete outcome-focused products. AI that is designed to pull specific data, often missing the larger context of why that data is needed.

The key here is that the above changes are not going to happen overnight. Warehouses and lakehouses won’t disappear overnight. In the near term, the pragmatic path is to augment existing infra with: (1) typed semantics and static analysis (SDF-like), (2) stateful, multi-engine transforms (SQLMesh-like), and (3) agent-aware orchestration layers that budget, cache, and checkpoint aggressively. Cloud vendors and platforms are already moving to support “agentic AI in production,” but the data layer still needs to catch up. Most likely:

Near term: You will be able to deliver agentic outcomes by bolting on planner layers to your current warehouse + streaming + MCP + vector DB stack.
Medium term: Expect agent-first data systems to emerge: query planners and storage built around speculative, branching workloads with native governance5.

Bottom line: expect to see an introduction of New (agentic) interfaces, sophisticated retrieval optimizers, new storage with caching6 abilities, and entirely new governance engines for sophisticated permission-handling.

Today, most of that is handled manually by specialized infra engineers. No AI can handle spec-driven development of the data infra. For instance, in the case of dbt, an analytics engineer captures this logic in some sort of semantic layer7 - part of the stack that resides somewhere between the ETL step and the end-user query. This gives a lot of flexibility, but also comes at a heavy cost: developing the model and maintaining the codebase is a lot of work. Nothing significant has changed since Benn Stencil’s original 2023 quote:

The demos run in small sandboxes that don’t resemble real data environments. The test questions are basic, unambiguous, and could be answered by an experienced analyst in a handful of lines of SQL. The distance between current AI start-ups focusing on Data and companies’ actual data ecosystems is staggering. dbt Labs’ CEO, Tristan Handy, said that a meaningful percentage of their customers are using dbt to create more than 5,000 tables. In that environment, there are no simple questions—answering “How many new European accounts did we add last week?” requires defining an account, what new means, where Europe is, when a week starts, in what time zone it should be calculated in, and if “add” means net or gross.

Taking into account all of the above, I could not shake the feeling that I no longer see the Fivetran <> dbt merger as something that can have a significant impact on the future. We might actually be at a pivotal moment in history that these tools risk becoming irrelevant. Even industry leaders like Snowflake acknowledge the shift – calling themselves an ‘AI data cloud’ – but, as Amplify notes, that’s often ‘bolting new marketing onto old architectures’. Fivetran/dbt merger could prove no different.

We might actually be at a pivotal moment in history that these tools risk becoming irrelevant.

Opportunities for new AI-native dev tooling

If the dbt acquisition completes, Fivetran won’t just “own transformation.” It will own three overlapping philosophies of transformation:

dbt: project-oriented modeling with human-maintained semantics,
SDF: typed SQL comprehension and static analysis,
SQLMesh: stateful planning and multi-engine recomputation

That could be fantastic if it’s unified into an AI-native dev surface that starts from intent and ends with governed outcomes. Or it could calcify the last era: three compilers, two planners, and one vendor bundle.

More importantly, this by itself does not guarantee that Fivetran is moving away from infrastructure processes toward delivering outcomes.

In an outcomes-first world, standalone connectors and “T” engines become liabilities: more surface area to integrate, govern, secure, and maintain, without guaranteeing the result the business actually needs. Together, Fivetran + dbt can move teams closer, but real gaps remain, and it’s not obvious that the two can be seamlessly unified in practice.

There is a very real opportunity now to rebuild this stack the AI-native way: let AI maintain the codebase, continuously capture domain context, and orchestrate from intent to answer. Five or ten years ago that sounded unrealistic. Today, it’s within reach.

But even more importantly, it is highly unclear what is the future of traditional data engineers? That’s a post for another time. For now, a picture will do (via x.com/@klabsintexas):

That’s the bet behind KLabs’ new AI-native DevTool designed to collapse the distance between a question and a reliable result, while the system handles semantics, pipelines, and governance in the background. The bet is clear: outcomes as the product, with AI maintaining the code and the system absorbing complexity: typed semantics, stateful planning, evals, governance, and agent-aware orchestration. Domain experts focus on what needs to happen, not how to wire it together.

So my “take” on the acquisition is simple: it’s a milestone of the last era. The next era belongs to platforms that make outcomes the product, not the by-product…

Components that parse/plan SQL and enforce data types

System tracking how data flows/transforms across models

Storage for job/run state to enable reliable incremental transforms

a unified interface where the AI translates high-level business requests into data workflows

Planners that can fork and explore multiple paths

Data layers designed to reuse results and reduce recomputation

Shared business definitions (metrics/entities) between data and queries

Mir's .Report

Discussion about this post