hamilton Add `Builder.with

Add `Builder.with_materializers()`

Open zilto opened this issue 9 months ago • 0 comments

Following the discussion from #816, there would be benefits to allow materializer nodes to be defined statically at the Driver level (both DataLoader and DataSaver).

The nodes can be called directly via .execute()
Materializers appear in HamiltonGraph and visualizations even if they aren't executed.
Validate the DAG, including the materializers before execution.

Solution 1

dr = (
  driver.Builder()
  .with_modules(...)
  .with_materializers(
    to.dlt(
      id="features_duckdb",
      dependencies=["features_df"]m
      destination=duckdb_dest(...),
    )
  )
  .build()
)

Solution 2

An alternative, would be to allow materializers to be imported and added via .with_modules(). For example, production_materializers.py contains

# production_materializers.py
from hamilton.io.materialization import to

to.dlt(
  id="features__duckdb",
  dependencies=["features_df"],
  destination=duckdb_dest(...),
)

from hamilton import driver
import dataflow
import production_materializers

dr = driver.Builder().with_modules(dataflow, production_materializers).build()

For basic to.parquet() usage, it might be more efficient to store simple Python functions using pd.to_parquet() in a module to enable this patterns. More powerful materializers (e.g., dlt) would benefit from this approach though

May 02 '24 15:05 zilto

hamilton hamilton copied to clipboard

Add `Builder.with_materializers()`

Solution 1

Solution 2

hamilton
hamilton copied to clipboard