hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Add `Builder.with_materializers()`

Open zilto opened this issue 9 months ago • 0 comments

Following the discussion from #816, there would be benefits to allow materializer nodes to be defined statically at the Driver level (both DataLoader and DataSaver).

  • The nodes can be called directly via .execute()
  • Materializers appear in HamiltonGraph and visualizations even if they aren't executed.
  • Validate the DAG, including the materializers before execution.

Solution 1

dr = (
  driver.Builder()
  .with_modules(...)
  .with_materializers(
    to.dlt(
      id="features_duckdb",
      dependencies=["features_df"]m
      destination=duckdb_dest(...),
    )
  )
  .build()
)

Solution 2

An alternative, would be to allow materializers to be imported and added via .with_modules(). For example, production_materializers.py contains

# production_materializers.py
from hamilton.io.materialization import to

to.dlt(
  id="features__duckdb",
  dependencies=["features_df"],
  destination=duckdb_dest(...),
)
from hamilton import driver
import dataflow
import production_materializers

dr = driver.Builder().with_modules(dataflow, production_materializers).build()

For basic to.parquet() usage, it might be more efficient to store simple Python functions using pd.to_parquet() in a module to enable this patterns. More powerful materializers (e.g., dlt) would benefit from this approach though

zilto avatar May 02 '24 15:05 zilto