pudl
pudl copied to clipboard
Simple data validation in asset checks
We want to make data validation checks easy to add to our assets.
I'm envisioning a future where we have something like:
@pv.weighted_bounds_checks(
[
pv.WeightedBoundsCheck(
title="some_check",
data_col="col1",
weight_col="col2",
low_bound=Bound(0.5, 2e5),
high_bound=Bound(0.5, 6e5)),
...
]
)
@pv.no_null_columns
@pv.row_count(12_345)
@asset
def my_cool_asset(upstream_asset):
...
Which would then make a bunch of asset_check
s that show up in the Dagster UI & fail the ETL.
We're hoping to achieve this in 10 hours or less of work, so we should start with the two very simple ones (row count + no null columns) and then work out the bounds one if we have time.