hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Utility for generating temporary, unique node names

Open HamiltonRepoMigrationBot opened this issue 2 years ago • 1 comments

Issue by elijahbenizzy Tuesday Nov 15, 2022 at 15:41 GMT Originally opened as https://github.com/stitchfix/hamilton/issues/230


We have a lot of cases (coming up) in which we generate unique/temporary nodes in decorator/DAG construction.

E.G.

  • generating a node in the new parameterized and extract_columns combo decorator
  • generating static/pass-through nodes for the new reuse_functions decorator

And a few more that we already do but I honestly can't remember right now... Currently these have the potential of clashing with each other, but I think we can do this in a much cleaner way. Properties we want:

(1) unique (2) readable (3) stable between runs (4) stable between DAG changes

TBD on implementation -- but I think a stable(ish) hash with a prefix and a low-digit number for collisions. If we toss readability a hash/uuid is fine.

We experienced a problem using the simple test-case:

@check_output(schema=schema, importance="fail")
def validated(raw: pd.DataFrame) -> pd.DataFrame:
    """Validate the raw input data."""
    return raw

in which the validated function was not correctly decorated and so the schema was not applied. The problem was resolved by renaming the function. It appeared there was a clash between the name of our node and a node generated internally by the decorator. The dataflow ran in spite of the clash and there was no warning that the schema was not being applied. Thanks to @elijahbenizzy for helping debug and sort this out.

amosaikman avatar Jul 07 '23 06:07 amosaikman

marking as stale for now.

skrawcz avatar Jul 18 '24 18:07 skrawcz