kedro icon indicating copy to clipboard operation
kedro copied to clipboard

Parent task: Content on Kedro vs complementary tools

Open merelcht opened this issue 2 years ago • 21 comments

Description (edit 06/09/2023)

The Kedro docs are missing a clear description about the value proposition of Kedro vs other tools.

Another topic related to this is migration guides about how to go from tool X to Kedro.

Ideas

  • article pages in docs
  • content pages for the Website to explain where Kedro fits in the "ecosystem" of tools (see https://ably.com/topics). I already have a ticket in the kedro-website project to design this so we publish them in Contentful.
  • a matrix of Kedro compared to X for various technologies that we may be compared to (rightly or wrongly). See https://ably.com/compare
  • videos (script them, work out who makes them later)
  • blogs
  • partner with plugin developer teams on content creation and run webinar showcases

merelcht avatar Nov 23 '22 15:11 merelcht

Let's compile a list of these "competitor/complementary" platforms.

Category 1:

Category 2:

Category 3:

Category 4:

This is something I'll do this week, I've earmarked some time...

stichbury avatar Nov 29 '22 13:11 stichbury

This is from some recent slide decks.

image

astrojuanlu avatar Feb 24 '23 11:02 astrojuanlu

Evidence that this could be useful for some users (private communication):

like it [Kedro] a lot, it's very versatile and interesting and above all the way it works, when you take the roll it speeds up [the development process] a lot (I think that's its goal, to make it reproducible). What I would like to have clearer is how it fits or differs from mlflow

astrojuanlu avatar Aug 14 '23 14:08 astrojuanlu

I think we should abstain to do blog posts or promotional content about this. People ask very frequently about Kedro vs MLFlow (happened to me last week), Kedro vs dbt (happened to me a minute ago), Kedro vs DVC and this should be more prominently explained in the documentation.

I'm advocating for moving this to https://github.com/kedro-org/kedro/ and raising its priority.

astrojuanlu avatar Sep 06 '23 09:09 astrojuanlu

Sure, let's do this.

  • [x] 1. Move this ticket as a "Kedro vs comparable tools, and make it a parent with a prioritized list of comparable tools
  • [ ] 2. Create a set of child tickets for each tool and execute according to priority in parent. Each "article" (could be a video, graphic, whatever, but let's assume text for now) needs to have sections on similiarities, differences, pros and cons and how to migrate to Kedro from the other tool
  • [x] 3. Create a new parent ticket "Kedro + tools" where we write about complementary products as opposed to comparable products. Likewise prioritize what we'll add as complementary tools. This is https://github.com/kedro-org/kedro/issues/2817
  • [ ] 4. Create child tickets as per 2.

@astrojuanlu Could you assist me with the lists. I have this big set of potential tools but need help to decide if they're in group 2 or 4 and also priorities thereof.

  • Build-your-own <--comparable (Kedro vs. X)
  • Cookiecutter <--comparable (Kedro vs. X)
  • Dagster
  • DBT
  • DVC
  • Great expectations <-- complementary (Kedro + X)
  • Hamilton <--comparable (Kedro vs. X)
  • Intake
  • MLflow <-- complementary (Kedro + X)
  • Orchestration platforms (various) <-- complementary (Kedro + X)
  • Pachyderm
  • Ploomber
  • ZenML

stichbury avatar Sep 06 '23 10:09 stichbury

Let's start with MLflow, dbt, DVC. The other ones are smaller and can be tackled at a later stage I think.

astrojuanlu avatar Sep 06 '23 11:09 astrojuanlu

Could you help me categorise since MLflow isn't a comparable tool but a complementary one, for the others. I'll jot down which I think are which and that'll help with deciding on the template for each type of article.

stichbury avatar Sep 06 '23 11:09 stichbury

Notice that MLflow now has MLflow Recipes (previously MLflow Pipelines) https://mlflow.org/docs/latest/recipes.html hence it can be considered a comparable tool.

image

See also the official announcement https://www.databricks.com/blog/2022/06/29/introducing-mlflow-pipelines-with-mlflow-2-0.html

astrojuanlu avatar Sep 06 '23 12:09 astrojuanlu

Also adding smart notebooks viz https://deepnote.com/blog/jupyter-notebook-alternative and https://hex.tech/

stichbury avatar Oct 17 '23 10:10 stichbury

Google's opinion:

image

So let's do:

  • MLflow
  • Airflow
  • dbt
  • DVC
  • Prefect
  • and maybe Dagster next

astrojuanlu avatar Jan 22 '24 10:01 astrojuanlu

Could we take some of the content that @NeroOkwa presented in his competitor analysis for this?

merelcht avatar Mar 27 '24 17:03 merelcht

I think it's much better to focus first on "how to use Kedro and X" (https://github.com/kedro-org/kedro/issues/3012#issuecomment-1903751448) rather than "why to use Kedro instead of X/differences & similarities between Kedro and X" (@NeroOkwa's competitor analysis).

astrojuanlu avatar Apr 04 '24 08:04 astrojuanlu

MLflow is done, Airflow is sufficiently covered in https://docs.kedro.org/en/stable/deployment/airflow.html

I'm shifting my focus to MLOps integrations for the next couple of months before coming back to this. Will add more details later.

astrojuanlu avatar Jun 06 '24 05:06 astrojuanlu

Maybe Kedro and SQLMesh as an alternative to dbt?

93e6a489-0ce7-4ee5-833d-58e69f376995_2510x1642

(source https://juhache.substack.com/p/multi-engine-stacks-deserve-to-be)

astrojuanlu avatar Jul 21 '24 22:07 astrojuanlu

Maybe Kedro and SQLMesh as an alternative to dbt?

93e6a489-0ce7-4ee5-833d-58e69f376995_2510x1642

(source https://juhache.substack.com/p/multi-engine-stacks-deserve-to-be)

Just to confirm, Kedro and SQLMesh as an alternative to dbt, or Kedro as an alternative to SQLMesh and dbt?

SQLMesh is already a direct competitor to dbt, so I think the latter makes sense. From the linked article:

If you think this could easily be run as a vanilla python function outside of SQLMesh: You’re right!

But what’s nice about SQLMesh is that you can add audits to run built-in data tests based on the pandas dataframe this returns.

I think Kedro could definitely be a great fit in these situations, or in general for Python projects, and we should push that. I like the approach of showing the similarities, but focusing on how you can get similar values while working with Python. If we can get it used in something like the above project, that would be amazing!

deepyaman avatar Jul 22 '24 13:07 deepyaman

Today we explored the possibility of showcasing how dlt with Kedro. Let's do it next.

astrojuanlu avatar Jul 30 '24 15:07 astrojuanlu

And let's include Delta & Iceberg too, which aren't by any means similar tools but can be used alongside Kedro successfully.

astrojuanlu avatar Aug 01 '24 09:08 astrojuanlu

Summary: in the coming months let's document

  • [ ] Delta (& Apache Iceberg)
  • [ ] DVC
  • [ ] OpenTelemetry w/ Logfire
  • [ ] dlt

And for full clarity, we're focusing on complementary, and not competitive, tools for now. I think unbiased comparisons are very hard to get right and the onus should be on the user to do their due diligence and reach their own conclusions.

astrojuanlu avatar Aug 01 '24 11:08 astrojuanlu

Today we explored the possibility of showcasing how dlt with Kedro. Let's do it next.

Issue documenting initial options: https://github.com/kedro-org/kedro/issues/4057

deepyaman avatar Aug 02 '24 20:08 deepyaman

Adding DVC #2691

astrojuanlu avatar Sep 20 '24 15:09 astrojuanlu

Tweaking the OpenTelemetry work item to explicitly include Logfire #3978

astrojuanlu avatar Sep 23 '24 08:09 astrojuanlu