hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Utilize the output of another node in data quality

Open elijahbenizzy opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. Say you want to ensure your input data is the same shape as your output data. This should be easy enough, but is not currently feasible. You could work around it if you want by joining them into a single tuple and writing a custom check to see if the two items match, but that's awkward and requires the joined one to be on the blocking path.

Describe the solution you'd like TBD exactly -- will write out more later, but I think we can use source and value, defaulting to value.

@check_output(index_matches=source('input_data'))
def output_data() -> pd.DataFrame:
    # ...

Or...

@check_output.custom(CustomIndexMatcher(index_matches=source('input_data'))
def output_data() -> pd.DataFrame:
    # ...

Describe alternatives you've considered See above, but nothing that clear

Additional context In a talk with an OS user.

elijahbenizzy avatar Jun 01 '23 14:06 elijahbenizzy

Two resources:

  • Pydantic passing context info to validators: https://docs.pydantic.dev/latest/usage/validators/#validation-context
  • Deal ensure decorator receives function inputs/ouputs: https://deal.readthedocs.io/basic/values.html#deal-ensure

zilto avatar Jul 03 '23 13:07 zilto

Changes one might need to make:

  1. Change this to add any non-static dependencies
  2. Loosen the validation here -- instead wire through the type as the validator type -- we can take the applies_to and make the parameter in (1) a union of the applicable types
  3. Change that to return validators in a delayed manner (E.G. a function that gives them given the constructor argument).

Basically we need to push this further downstream -- we only need to know (a) which validators to build and (b) which parameters they take in to build the DAG, then we can construct them at runtime. So, probably a complex change but not too many lines of code.

elijahbenizzy avatar Sep 27 '23 17:09 elijahbenizzy