hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Dagster comparison is not correct

Open danielgafni opened this issue 10 months ago • 3 comments

Hi!

I was curious about Hamilton because I was looking for a lightweight DAG library.

Coming from Dagster, I got naturally interested in the Hamilton vs Dagster comparison and found this page in the docs: https://hamilton.dagworks.io/en/latest/code-comparisons/dagster/

I noticed it does not provide accurate information about Dagster and the code examples are not using some of Dagster's main features. More concrete:

  • issues with the first example:
    • It does not utilize the IOManager to decouple I/O from computations
    • It incorrectly states that asset descriptions have to be defined via metadata, which is not correct (they can be defined in native function docstrings or via the @asset(description=... argument).
  • issues with the second example:
    • it incorrectly states that the Dagster job can't be executed in a local Python process
    • it incorrectly states that I/O and computations are coupled (duplicate)
    • the comparison between loading environment variables at runtime and providing configuration time references like dagster.EnvVar does not make much sense. Dagster's configuration purposely enables deferring the setting of the exact configuration parameters (since Dagster runs can be executed remotely, e.g. in a Kubernetes pod, and the env var might not be available outside of the remote system). But nothing is preventing the user from setting the value with os.getenv directly if needed.

Minor (in the main comparison table):

  • important and unique Dagster features such as Declarative Materialization and Pipes are not mentioned
  • data versioning comparison is a bit strange: it's not very clear how does Hamilton automatically identify code versions (e.g. how does it distinguish between refactoring-like changes and changes in the actual business logic). Dagster's data versioning system enforces explicit code version management to avoid unwanted expensive materializations of the entire asset graph (see: declarative automation).
  • important Dagster integrations such as dagster-dbt are not mentioned

Current behavior

The Dagster example is not using relevant Dagster features and provides inaccurate information.

Expected behavior

The comparison between Hamilton and Dagster should use analogous features in both frameworks to be fair. In particular, it should use the IOManager as it's one of the main selling points of Dagster:

import dagster as dg
import pandas as pd

@dg.asset
def topstory_ids() -> pd.DataFrame: ...

@dg.asset
def topstories(topstory_ids: pd.DataFrame): ...

Note that some of the popular IOManagers for Pandas and Polars also support loading a subset of the dataframe columns: @asset(metadata={"columns": ["title"]}).

It should also provide accurate information on other topics mentioned above.

Additional context

Technically, this is not a bug, but I couldn't find a better label for this issue.

I am willing to help with improving these docs if my help is considered welcome!

danielgafni avatar Mar 03 '25 22:03 danielgafni

thanks for raising. @danielgafni I believe we took the example from here. Otherwise there could be dated material given things were correct at the time of publishing. e.g. executing in a local process was only considered for testing not production use at the time - I think that statement is correct from a comparison standpoint if that still holds.

Feel free to make a PR and suggest edits. This should all be under /docs/.

skrawcz avatar Mar 03 '25 23:03 skrawcz

Otherwise you'll find Hamilton inspired a few of the Dagster APIs ;)

skrawcz avatar Mar 03 '25 23:03 skrawcz

I believe we took the example from here.

Uh interesting! Thanks for the pointer, this explains a lot :) I think this example was simplified on purpose.

testing not production use at the time

Definitely true, I would not consider running stuff locally a production practice :) But there are also other types of workflows where Dagster can be used locally --- obviously, integration testing, but also more esoteric use cases such as this one.

Feel free to make a PR and suggest edits.

Sure, I can make an update.

Otherwise you'll find Hamilton inspired a few of the Dagster APIs ;)

Cool, which ones?


P.S. By the way, I was looking into Hamilton in order to bring some structure and orchestration to Dagger workflows.

danielgafni avatar Mar 03 '25 23:03 danielgafni