4 min read

Scaling Analytics with dbt @ Decathlon

Scaling Analytics with dbt @ Decathlon
Decathlon data stack
It’s been ~ 1 year since I began working with Decathlon.

Over the past year, I’ve led a deep transformation of how the Supply Data team uses dbt — scaling it from a patchy, under-optimized setup into a powerful, cost-efficient, and well-governed analytics pipeline running on Databricks.

👉 Here’s how we went from “it works” to “this scales”.

From dbt Cloud to dbt Core + Airflow

The first major turning point was migrating from dbt Cloud to dbt Core orchestrated via Airflow.

This gave us the control and flexibility we needed to handle a growing number of models, contributors, and use cases — while cutting licensing costs [*].

When we decided to move from dbt Cloud to dbt Core + Airflow, we had two choices:

  1. Lift-and-shift: migrate everything as-is and maybe refactor later,
  2. Rebuild intentionally: take the opportunity to rethink architecture and deliver org-wide value from day one.

I chose the second path. Our dbt Cloud jobs were too consolidated (dozens of models in one DAG), and the current structure just wouldn’t translate cleanly into Airflow.

I also wanted to jump on this opportunity to make something that would help all people struggling with Airflow in the organization.

Of course, we explored Cosmos — an open-source dbt-to-Airflow adapter — but ultimately opted out because of performance overhead. On our shared Airflow infrastructure, Cosmos added latency and complexity we couldn’t afford.

So we built our own: the Factory — a lightweight Airflow library tailored for analytics use cases. Designed for all Decathlon Analytics people (i.e. DA / BI / AE / DE i.e. > 250 people), it combined the flexibility of Python-native DAGs with opinionated defaults:

  • Native alerting via Google Chat, Slack, and Email,
  • Tableau orchestration (yeah, automated extracts!),
  • Smart dbt test surfacing, so stakeholders could act early,
  • Event-driven scheduling to avoid wasteful runs,
  • Variable passing & exposure logging to improve auditability and observability.

This orchestration layer quickly became our backbone — and a springboard for serious performance and cost wins.

[*] Of course, you still have to pay for Airflow, but since at Decathlon Airflow instances are mutualized, you mechanically decrease your cost compared to an Analytics only solution like dbt Cloud.


30% Cost Reduction, 600% Runtime Improvement

After establishing a robust foundation, we turned to efficiency. Through smarter scheduling, pipeline optimization (think incremental / microbatch runs & var scheduling), and a ruthless focus on unnecessary test/runs, we:

  • Cut dbt costs by ~30%,
  • Improved our CI/CD runtime by ~600% (from 1 hour down to ~10 minutes)

The difference was night and day: faster dev loops, fewer failed deployments, and a clearer path to production.


Building a Culture of Data Engineering Excellence

Scaling dbt is about more than infra. It’s about people.

We trained and coached 10 Data Analysts, 5 Data Scientists, and 3 Data Engineers to write performant, testable dbt code using best practices — so they could own their transformations, not just request them.

We delivered workshops on:

  • STAR and Snowflake modeling patterns (and how to use them well),
  • Modular, DRY SQL and the logic of reusable macros,
  • Effective testing strategies: data tests, unit tests, and CI validation
  • How to manage environments (dev, preprod, prod) safely

We reviewed pull requests, co-designed transformations, and made sure everyone was equipped to deliver reliable, auditable outputs — independently.

This training would make it so that everyone who wanted to participate to this dbt journey would be able to. And the response was enthusiastic — adoption accross analysts surged.

dbt sparked so much interest on the DA / DS side, that, after careful consideration, we decided to let them have ownership of the aggregate layer (i.e. where we prep data for BI / Analysis work).

This meant that responsibility was divided as such:

Organization of Analytics @ Decathlon
  • AE were responsible for:
    • Organising the data: this meant create an ERD, since we decided to ditch the OBT for the Core Layer & go for Kimball instead,
    • Test the data,
    • Create frameworks for everybody to work with (i.e. testing, Airflow etc.)
  • DA / DS were responsible for:
    • Understanding the core layer & how to use it,
    • Focus on making simple joins & aggregations

Deliverables That Moved the Needle

Our work wasn’t theoretical. Tangible outcomes included:

  • A restructured data model built on STAR/Snowflake architecture
  • Defined KPIs and strategy for the analytics team
  • Massive cost and performance optimizations across the board
  • Clear technical guidance for scaling dbt responsibly
  • A sustainable dbt practice that scales beyond individuals

Lessons Learned

If I had to boil it down, scaling dbt for analytics at the org level comes down to a few core principles:

  1. Model for scale — STAR/Snowflake modeling isn’t old school, it’s maintainable,
  2. Own your orchestration — understand what’s going on & take ownership,
  3. Test smart, not just hard — unit tests + CI feedback loops = confident releases,
  4. Cost isn’t an afterthought — it’s at the core of Data Engineering,
  5. Teach, don’t gatekeep — upskilled analysts == a multiplier on delivery.

Conclusion

As my first year at Decathlon comes to an end, one thing is clearer than ever: scaling dbt isn’t just about performance, faster pipelines or cutting costs.

It’s also about enabling more people to safely build better data products — driving faster decisions, more confident experimentation, and scalable governance.

In other words: enabling better business outcomes through better data engineering practices & influence.



Interested in my expertise ?
CTA Image

Struggling with your data analytics project? Let's fix it together.
I help teams solve tough data problems — fast.

Let's talk !