hamilton
hamilton copied to clipboard
Add `Builder.with_materializers()`
Following the discussion from #816, there would be benefits to allow materializer nodes to be defined statically at the Driver level (both DataLoader
and DataSaver
).
- The nodes can be called directly via
.execute()
- Materializers appear in
HamiltonGraph
and visualizations even if they aren't executed. - Validate the DAG, including the materializers before execution.
Solution 1
dr = (
driver.Builder()
.with_modules(...)
.with_materializers(
to.dlt(
id="features_duckdb",
dependencies=["features_df"]m
destination=duckdb_dest(...),
)
)
.build()
)
Solution 2
An alternative, would be to allow materializers to be imported and added via .with_modules()
. For example, production_materializers.py
contains
# production_materializers.py
from hamilton.io.materialization import to
to.dlt(
id="features__duckdb",
dependencies=["features_df"],
destination=duckdb_dest(...),
)
from hamilton import driver
import dataflow
import production_materializers
dr = driver.Builder().with_modules(dataflow, production_materializers).build()
For basic to.parquet()
usage, it might be more efficient to store simple Python functions using pd.to_parquet()
in a module to enable this patterns. More powerful materializers (e.g., dlt) would benefit from this approach though