Utilize the output of another node in data quality
Is your feature request related to a problem? Please describe. Say you want to ensure your input data is the same shape as your output data. This should be easy enough, but is not currently feasible. You could work around it if you want by joining them into a single tuple and writing a custom check to see if the two items match, but that's awkward and requires the joined one to be on the blocking path.
Describe the solution you'd like
TBD exactly -- will write out more later, but I think we can use source and value, defaulting to value.
@check_output(index_matches=source('input_data'))
def output_data() -> pd.DataFrame:
# ...
Or...
@check_output.custom(CustomIndexMatcher(index_matches=source('input_data'))
def output_data() -> pd.DataFrame:
# ...
Describe alternatives you've considered See above, but nothing that clear
Additional context In a talk with an OS user.
Two resources:
- Pydantic passing context info to validators: https://docs.pydantic.dev/latest/usage/validators/#validation-context
- Deal ensure decorator receives function inputs/ouputs: https://deal.readthedocs.io/basic/values.html#deal-ensure
Changes one might need to make:
- Change this to add any non-static dependencies
- Loosen the validation here -- instead wire through the type as the validator type -- we can take the applies_to and make the parameter in (1) a union of the applicable types
- Change that to return validators in a delayed manner (E.G. a function that gives them given the constructor argument).
Basically we need to push this further downstream -- we only need to know (a) which validators to build and (b) which parameters they take in to build the DAG, then we can construct them at runtime. So, probably a complex change but not too many lines of code.