deduplicate_inputs should not try to group inputs

Open 0x26res opened this issue 2 years ago • 1 comments

When calling Drive.visualize_execution with the flag deduplicate_inputs=True the resulting graph has got one node for each possible combination of inputs that are connected to the dag.

Current behavior

In the example below, inputs a and b should be indenpent nodes. But there is a is also a node for a and b. So you run the risk of having as many inputs node as there are combination of inputs in the downstream functions.

Screenshots

Steps to replicate behavior

def a_plus_b(a: int, b: int) -> int:
    return a + b


def a_times_2(a: int) -> int:
    return a * 2


def b_times_2(b: int) -> int:
    return b * 2


def everything(a_plus_b: int, a_times_2: int, b_times_2: int) -> int:
    return a_plus_b + a_times_2 + b_times_2

def test_wrong_inputs():
    driver = Driver(
        {},
        example_module,
        adapter=SimplePythonGraphAdapter(DictResult()),
    )
    inputs = {"a": 1, "b": 2}
    return driver.visualize_execution(
        final_vars=["everything"],
        inputs=inputs,
        output_file_path="./visualize_inputs.dot",
        deduplicate_inputs=True,
    )

Library & System Information

hamilton==1.42.0

Expected behavior

There should be one node of a, one for b and no node for a and b.

Jan 02 '24 16:01 0x26res

(copying my reply from a Slack thread)

What's displayed is the intended behavior of 2 changes compared to the early viz feature, but I overlooked one feature.

Change 1: In a viz, a function node actually has a single "input" node that contains "the set of inputs node" of the function node. Therefore, a_plus_b would always have the box with a and b. Motivation: this ways, inputs are always locate together to better understand what the node requires (would get very messy otherwise, following lengthy edges)

Change 2: Then, the deduplicate_inputs flag deduplicates "input nodes in the viz" which are actually "the set of inputs node" mentioned in Change 1. Motivation: If True, you guarantee that inputs are near the function node. If False, you can reduce clutter. It was difficult to pick a default because the best option depends on the specifics of the DAG

Potential solution: The issue raised appears related to Change 1 because there is currently no way to ungroup "the set of inputs node". It shouldn't be difficult to implement an group_inputs flag. Then, the deduplicate_inputs would work as expected. I now realize that the semantics of deduplicate_inputs are confusing given the grouping feature. If you have suggestions for a better argument name, I'll take it!

Jan 02 '24 17:01 zilto