2023 was an exciting year for the dbt and Analytics Engineering field. There were a lot of changes: different pricing for dbt Cloud, the acquisition of Transform, several dbt version upgrades, and various changes to the product itself. On top of that, the data industry and AI had a pretty fruitful year. We’ve seen a dawn of ChatGPT and LLMs, AI-based start-ups and numerous applications that really changed IT industry.
Considering all these developments, let's explore together and try to predict where Analytics Engineering is headed in 2024.
🤖 Tight(er) integration with AI
There is no doubt that AI and LLMs were a major focal point in 2023. Not using ChatGPT in your daily work is akin to digging a hole with a shovel when you have an excavator in your garage. And we’ve seen a lot of applications across all technological fields, including data engineering and analytics in general.
At some point, people started applying LLMs to dbt. As example, this package can read your dbt project and give suggestions for improvements. Well, I doubt that the quality of suggestions were matched to a qualified analytics engineer, but at least this was a start.
More promising application was writing dbt documentation. This task is more straightforward to describe and the quality of output is much better. This extension allows you to write dbt documentation with AI and gives you a column-level lineage. Now we are talking!
And finally, the most promising application, in my opinion, is smart chat-bots for analytics. Yes, we’ve seen a rise of text-to-SQL bots that could translate human request to SQL and return a resulted dataset. But most of them lacked one thing — deep understanding of the business and underlying processes. And that what separated “dumb” chat-SQL bots from human data analysts. But hear me out, there is a better and smarter way.
Jason Ganz and some other folks from from dbt Labs explained how they managed to increase the quality of answers of text-to-SQL bot by giving it Semantic Layer as an input (instead of raw data description). They’ve seen a mind-blowing 83% increase of accuracy rate!
So, yeah, AI is powerful and it can be applicable to dbt and Analytics Engineering. I hope that in 2024 we will see some quality tools that will help us to use LLMs more effectively and conveniently🤞.
🏢 dbt for Enterprise
dbt is a business, and like any business, it needs to earn money. In the software industry, particularly for dbt, big companies or enterprises present the greatest opportunity for revenue generation. And no surprises dbt Labs is rolling out a lot of features for them.
One of the key ideas that was visible throughout Coalesce 2023 was about simplifying scaling of dbt in organizations. Just watch this keynote video and count how many time they mentioned word “organization”.
Notable solutions include:
dbt Explorer feature that will come in place of dbt docs to give more convenient experience for Enterprises (will it even be available for dbt-core users? Dunno 🤷♂️)
dbt mesh — a set of solutions that will allow companies build Data Mesh architecture. dbt developers steadily converged to this point by implementing various features, such as cross-project refs (since v1.6) or data contracts (since v1.5). Finally, big customers will be able to try out this new architecture without switching the tooling.
Finally, dbt Cloud now natively supports Microsoft Fabric and recently got a native adapter for Synapse DWH. Microsoft Fabric by itself is very interesting set of technologies and solutions for building data platforms that follows Data Fabric architecture (like, Data Mesh wasn’t enough). And many big companies who are strategically with Microsoft, are going to use Fabric, so dbt wants to be on this market.
What all this means for analytics engineers?
I think that with time we will see dbt Mesh and some other features as requirements in job descriptions once big teams adopt these features. For us means that we need to learn these technologies, even if we don’t plan to use them on current projects. This is gonna be the way to stay competitive on the job market.
📦 Further development of ecosystem
One of the things I love about dbt is the availability of numerous external packages and how easy it is to integrate them into your project. Many dbt packages require minimal initial configuration and provide significant value to dbt projects.
My point here is extremely simple — don't be afraid to use external tools to enhance your pipeline. Without them, I would have to implement a lot of features myself, and I'm probably not the best coder. Here is a few examples.
Using Elementary it was very easy to implement data observability and alerting.
Ever needed dataset comparisons, e.g. old version of the table on Prod vs new version on Dev? Try Datafold’s data-diff tool.
Wanna diff reports for Pull Requests? Try Piperider.
Wanna know how much credits your dbt model cost in Snowflake? Try dbt-snowflake-monitoring from SELECT.
I can probably go on and on, but I guess you get the point.
Of course, the core functionality of dbt will continue to evolve. Just recall how in v1.7 they integrated the date-spine macro into the core codebase. This suggests that some features will eventually become part of the core framework. However, as an analytics engineer, your job is to deliver those features today. This is where external packages can help you.
For the next year I expect to see even more new tools and ways to enhance dbt projects and make data models even better. Better testing, better alerting, better lineage (easy column-level lineage, please!) and who knows what else. Stay in touch!
💸 Better cost containments
(Special thanks to Zach for this point)
There is no single way to write dbt models. But there are some dbt practices that are suggested by the dbt itself and the community. But this hides one problem — this may be not ideal for every data stack. Here is why.
dbt is undoubtedly a great tool for abstracting the complexity of data engineering. It allows data analysts and analytics engineers to focus on data models rather than getting tangled up in the intricacies of DDL, incremental updates, and so on. However, there is a caveat. While the same code may perform exceptionally well in Snowflake, it may not work as effectively in other databases, such as BigQuery.
I once encountered an issue while working with dynamically partitioned tables in BigQuery. Despite having what seemed to be correct code, I consistently faced table full-scans. Upon investigation, I discovered that my is_incremental() condition, which I blindly copied from Snowflake code, did not function the same way in BigQuery. This triggered expensive full-scans. Reading the documentation helped me find the proper approach. Initially, the problem was not obvious since the code appeared to work and dbt did not raise any complaints. However, it was not optimal and incurred additional costs.
That’s why learning models optimization should be one of the main priorities next year. In the current microeconomics environment, cost containments are gonna be top priorities for many companies.
Okay, since we mentioned the topic of learning, let's discuss this field separately.
🧑🏫 More ways to learn
Next year, there will be a lot of opportunities to learn dbt and Analytics Engineering. Knowledge is becoming increasingly accessible and easier to get. It is truly an exciting time to be in data!
I would like to suggest a few methods for learning the subject.
Courses
For beginners I’d recommend start with the official free courses from dbt Labs. In there you will find a lot of topics, from fundamentals to advanced.
For middle and senior developers there are a plenty of high-quality paid courses. I can highly recommend “Analytics engineering with dbt” from Emily Hawking and “Advanced dbt” from Lindsay Murphy. I've been on both and I truly believe these are top-notch courses!
Newsletters
Newsletters are a great way to acquire knowledge without actively participating in online sessions and scheduled calls, like in online courses. You receive periodic updates and can read news and tips at your own convenience asynchronously. Also, newsletters are great regardless of your experience level. Both juniors and seniors can find something useful in them.
Here are my recommendations:
First and foremost, subscribe to the official newsletter from dbt — “Analytics Engineering Roundup”. I’ve been subscriber of it for a long-long time and learned quite a few things out of it.
Next, try ”Learn Analytics Engineering” from Madison Schott. This is a high-quality newsletter about everything Analytics Engineering, including tutorials and articles on general topics. Love it!
And finally, consider subscribing to my #dbtips newsletter, where I also talk about dbt and Analytics Engineering.
Podcasts
My only suggestion for podcasts is going to be "The Analytics Engineering Podcast" from dbt Labs (Spotify link, but I think you can find it in many other places as well). New episodes are released once or twice per month, so it won't be annoying.
Feel free to recommend any other podcasts (or any other leaning resources) you found useful.
For sure, 2024 is going to be an awesome year for Analytics Engineering and for data industry in general. There are a few challenges ahead, but also cool new technologies and breakthroughs. I’ll repeat it once again — it’s really an awesome time to be in data!
Happy New Year! 🥳