deduplicate_inputs should not try to group inputs
When calling Drive.visualize_execution with the flag deduplicate_inputs=True the resulting graph has got one node for each possible combination of inputs that are connected to the dag.
Current behavior
In the example below, inputs a and b should be indenpent nodes. But there is a is also a node for a and b. So you run the risk of having as many inputs node as there are combination of inputs in the downstream functions.
Screenshots
Steps to replicate behavior
def a_plus_b(a: int, b: int) -> int:
return a + b
def a_times_2(a: int) -> int:
return a * 2
def b_times_2(b: int) -> int:
return b * 2
def everything(a_plus_b: int, a_times_2: int, b_times_2: int) -> int:
return a_plus_b + a_times_2 + b_times_2
def test_wrong_inputs():
driver = Driver(
{},
example_module,
adapter=SimplePythonGraphAdapter(DictResult()),
)
inputs = {"a": 1, "b": 2}
return driver.visualize_execution(
final_vars=["everything"],
inputs=inputs,
output_file_path="./visualize_inputs.dot",
deduplicate_inputs=True,
)
Library & System Information
hamilton==1.42.0
Expected behavior
There should be one node of a, one for b and no node for a and b.
(copying my reply from a Slack thread)
What's displayed is the intended behavior of 2 changes compared to the early viz feature, but I overlooked one feature.
Change 1: In a viz, a function node actually has a single "input" node that contains "the set of inputs node" of the function node. Therefore, a_plus_b would always have the box with a and b.
Motivation: this ways, inputs are always locate together to better understand what the node requires (would get very messy otherwise, following lengthy edges)
Change 2: Then, the deduplicate_inputs flag deduplicates "input nodes in the viz" which are actually "the set of inputs node" mentioned in Change 1.
Motivation: If True, you guarantee that inputs are near the function node. If False, you can reduce clutter. It was difficult to pick a default because the best option depends on the specifics of the DAG
Potential solution: The issue raised appears related to Change 1 because there is currently no way to ungroup "the set of inputs node". It shouldn't be difficult to implement an group_inputs flag. Then, the deduplicate_inputs
would work as expected. I now realize that the semantics of deduplicate_inputs are confusing given the grouping feature. If you have suggestions for a better argument name, I'll take it!