Utility for generating temporary, unique node names
Issue by elijahbenizzy
Tuesday Nov 15, 2022 at 15:41 GMT
Originally opened as https://github.com/stitchfix/hamilton/issues/230
We have a lot of cases (coming up) in which we generate unique/temporary nodes in decorator/DAG construction.
E.G.
- generating a node in the new
parameterizedandextract_columnscombo decorator - generating static/pass-through nodes for the new
reuse_functionsdecorator
And a few more that we already do but I honestly can't remember right now... Currently these have the potential of clashing with each other, but I think we can do this in a much cleaner way. Properties we want:
(1) unique (2) readable (3) stable between runs (4) stable between DAG changes
TBD on implementation -- but I think a stable(ish) hash with a prefix and a low-digit number for collisions. If we toss readability a hash/uuid is fine.
We experienced a problem using the simple test-case:
@check_output(schema=schema, importance="fail")
def validated(raw: pd.DataFrame) -> pd.DataFrame:
"""Validate the raw input data."""
return raw
in which the validated function was not correctly decorated and so the schema was not applied. The problem was resolved by renaming the function. It appeared there was a clash between the name of our node and a node generated internally by the decorator. The dataflow ran in spite of the clash and there was no warning that the schema was not being applied. Thanks to @elijahbenizzy for helping debug and sort this out.
marking as stale for now.