Do you know if DBTDag implementation is equivalent to running `dbt run --models <model>` for each model traversing the graph or there is some parallelization happening behind the scenes?
If so, each tasks will occupy one slot in Airflow so if you have multiple pipelines running at the same time sharing the same pool your pipelines will start slow down waiting for a free slot, isn't it?
Sure, it totally possible if you are using kubernetes! I suppose it's gonna be similar to BashOperator, since you are going to run dbt as a CLI command.
I am just thinking that it's missing in the article. And this is the way how we use it. Probably the most stable for bigger DBA projects with Airflow ig not to use cloud
what if you have more than 400 DBT models? is the DBTDag still the best option?
Is better to have 400 tasks instead of just 3?
You could try DbtTaskGroup to group them together. Plus, you can split dbt project into several DAGs (e.g. by a domain).
But if you feel more comfortable with three bash operators – it's totally fine. I had this setup and it worked well.
Do you know if DBTDag implementation is equivalent to running `dbt run --models <model>` for each model traversing the graph or there is some parallelization happening behind the scenes?
If so, each tasks will occupy one slot in Airflow so if you have multiple pipelines running at the same time sharing the same pool your pipelines will start slow down waiting for a free slot, isn't it?
Nice article. What about running in k8s using docker and pod operator?
Sure, it totally possible if you are using kubernetes! I suppose it's gonna be similar to BashOperator, since you are going to run dbt as a CLI command.
I am just thinking that it's missing in the article. And this is the way how we use it. Probably the most stable for bigger DBA projects with Airflow ig not to use cloud