pudl icon indicating copy to clipboard operation
pudl copied to clipboard

Simple data validation in asset checks

Open jdangerx opened this issue 1 year ago • 0 comments

We want to make data validation checks easy to add to our assets.

I'm envisioning a future where we have something like:

@pv.weighted_bounds_checks(
    [
        pv.WeightedBoundsCheck(
            title="some_check",
            data_col="col1",
            weight_col="col2",
            low_bound=Bound(0.5, 2e5),
            high_bound=Bound(0.5, 6e5)),
        ...
    ]
)
@pv.no_null_columns
@pv.row_count(12_345)
@asset
def my_cool_asset(upstream_asset):
    ...

Which would then make a bunch of asset_checks that show up in the Dagster UI & fail the ETL.

We're hoping to achieve this in 10 hours or less of work, so we should start with the two very simple ones (row count + no null columns) and then work out the bounds one if we have time.

jdangerx avatar Feb 20 '24 20:02 jdangerx