kedro
kedro copied to clipboard
Parent task: Content on Kedro vs complementary tools
Description (edit 06/09/2023)
The Kedro docs are missing a clear description about the value proposition of Kedro vs other tools.
Another topic related to this is migration guides about how to go from tool X to Kedro.
Ideas
- article pages in docs
- content pages for the Website to explain where Kedro fits in the "ecosystem" of tools (see https://ably.com/topics). I already have a ticket in the
kedro-website
project to design this so we publish them in Contentful. - a matrix of Kedro compared to X for various technologies that we may be compared to (rightly or wrongly). See https://ably.com/compare
- videos (script them, work out who makes them later)
- blogs
- partner with plugin developer teams on content creation and run webinar showcases
Let's compile a list of these "competitor/complementary" platforms.
Category 1:
Category 2:
Category 3:
Category 4:
This is something I'll do this week, I've earmarked some time...
This is from some recent slide decks.

Evidence that this could be useful for some users (private communication):
like it [Kedro] a lot, it's very versatile and interesting and above all the way it works, when you take the roll it speeds up [the development process] a lot (I think that's its goal, to make it reproducible). What I would like to have clearer is how it fits or differs from mlflow
I think we should abstain to do blog posts or promotional content about this. People ask very frequently about Kedro vs MLFlow (happened to me last week), Kedro vs dbt (happened to me a minute ago), Kedro vs DVC and this should be more prominently explained in the documentation.
I'm advocating for moving this to https://github.com/kedro-org/kedro/ and raising its priority.
Sure, let's do this.
- [x] 1. Move this ticket as a "Kedro vs comparable tools, and make it a parent with a prioritized list of comparable tools
- [ ] 2. Create a set of child tickets for each tool and execute according to priority in parent. Each "article" (could be a video, graphic, whatever, but let's assume text for now) needs to have sections on similiarities, differences, pros and cons and how to migrate to Kedro from the other tool
- [x] 3. Create a new parent ticket "Kedro + tools" where we write about complementary products as opposed to comparable products. Likewise prioritize what we'll add as complementary tools. This is https://github.com/kedro-org/kedro/issues/2817
- [ ] 4. Create child tickets as per 2.
@astrojuanlu Could you assist me with the lists. I have this big set of potential tools but need help to decide if they're in group 2 or 4 and also priorities thereof.
- Build-your-own <--comparable (Kedro vs. X)
- Cookiecutter <--comparable (Kedro vs. X)
- Dagster
- DBT
- DVC
- Great expectations <-- complementary (Kedro + X)
- Hamilton <--comparable (Kedro vs. X)
- Intake
- MLflow <-- complementary (Kedro + X)
- Orchestration platforms (various) <-- complementary (Kedro + X)
- Pachyderm
- Ploomber
- ZenML
Let's start with MLflow, dbt, DVC. The other ones are smaller and can be tackled at a later stage I think.
Could you help me categorise since MLflow isn't a comparable tool but a complementary one, for the others. I'll jot down which I think are which and that'll help with deciding on the template for each type of article.
Notice that MLflow now has MLflow Recipes (previously MLflow Pipelines) https://mlflow.org/docs/latest/recipes.html hence it can be considered a comparable tool.
See also the official announcement https://www.databricks.com/blog/2022/06/29/introducing-mlflow-pipelines-with-mlflow-2-0.html
Also adding smart notebooks viz https://deepnote.com/blog/jupyter-notebook-alternative and https://hex.tech/
Google's opinion:
So let's do:
- MLflow
- Airflow
- dbt
- DVC
- Prefect
- and maybe Dagster next
Could we take some of the content that @NeroOkwa presented in his competitor analysis for this?
I think it's much better to focus first on "how to use Kedro and X" (https://github.com/kedro-org/kedro/issues/3012#issuecomment-1903751448) rather than "why to use Kedro instead of X/differences & similarities between Kedro and X" (@NeroOkwa's competitor analysis).
MLflow is done, Airflow is sufficiently covered in https://docs.kedro.org/en/stable/deployment/airflow.html
I'm shifting my focus to MLOps integrations for the next couple of months before coming back to this. Will add more details later.
Maybe Kedro and SQLMesh as an alternative to dbt?
(source https://juhache.substack.com/p/multi-engine-stacks-deserve-to-be)
Maybe Kedro and SQLMesh as an alternative to dbt?
(source https://juhache.substack.com/p/multi-engine-stacks-deserve-to-be)
Just to confirm, Kedro and SQLMesh as an alternative to dbt, or Kedro as an alternative to SQLMesh and dbt?
SQLMesh is already a direct competitor to dbt, so I think the latter makes sense. From the linked article:
If you think this could easily be run as a vanilla python function outside of SQLMesh: You’re right!
But what’s nice about SQLMesh is that you can add audits to run built-in data tests based on the pandas dataframe this returns.
I think Kedro could definitely be a great fit in these situations, or in general for Python projects, and we should push that. I like the approach of showing the similarities, but focusing on how you can get similar values while working with Python. If we can get it used in something like the above project, that would be amazing!
Today we explored the possibility of showcasing how dlt with Kedro. Let's do it next.
And let's include Delta & Iceberg too, which aren't by any means similar tools but can be used alongside Kedro successfully.
Summary: in the coming months let's document
- [ ] Delta (& Apache Iceberg)
- [ ] DVC
- [ ] OpenTelemetry w/ Logfire
- [ ] dlt
And for full clarity, we're focusing on complementary, and not competitive, tools for now. I think unbiased comparisons are very hard to get right and the onus should be on the user to do their due diligence and reach their own conclusions.
Today we explored the possibility of showcasing how dlt with Kedro. Let's do it next.
Issue documenting initial options: https://github.com/kedro-org/kedro/issues/4057
Adding DVC #2691
Tweaking the OpenTelemetry work item to explicitly include Logfire #3978