Running dbt with Airflow

Oleg Agapov

Sep 16, 2024

Several ways to run dbt pipelines with Apache Airflow

Read →

6 Comments

Dennys

Sep 19, 2024

what if you have more than 400 DBT models? is the DBTDag still the best option?

Is better to have 400 tasks instead of just 3?

Expand full comment

Reply (1)

Oleg Agapov

Sep 19, 2024

You could try DbtTaskGroup to group them together. Plus, you can split dbt project into several DAGs (e.g. by a domain).

But if you feel more comfortable with three bash operators – it's totally fine. I had this setup and it worked well.

Expand full comment

Reply (1)

Dennys

Sep 24, 2024Edited

Do you know if DBTDag implementation is equivalent to running `dbt run --models <model>` for each model traversing the graph or there is some parallelization happening behind the scenes?

If so, each tasks will occupy one slot in Airflow so if you have multiple pipelines running at the same time sharing the same pool your pipelines will start slow down waiting for a free slot, isn't it?

Expand full comment

Алексей Смирнов

Sep 17, 2024

Nice article. What about running in k8s using docker and pod operator?

Expand full comment

Reply (1)

Oleg Agapov

Sep 17, 2024

Sure, it totally possible if you are using kubernetes! I suppose it's gonna be similar to BashOperator, since you are going to run dbt as a CLI command.

Expand full comment

Reply (1)

Алексей Смирнов

Sep 17, 2024

I am just thinking that it's missing in the article. And this is the way how we use it. Probably the most stable for bigger DBA projects with Airflow ig not to use cloud

Expand full comment

#dbtips

Running dbt with Airflow