data icon indicating copy to clipboard operation
data copied to clipboard

[Discussion] Expected graph when `traverse` a DataPipe with circular references

Open ejguan opened this issue 3 years ago • 4 comments

🐛 Describe the bug

I am trying to understand what would be the expected result when we traverse a DataPipe graph containing circular references. Assume we have the following reference map:

graph TD;
DP1-->DP2;
DP2-->DP3;
DP3-->DP1;
DP3-->output;

When we traverse DP3, what would be expected result?

  • {DP3: {DP2: {DP1: {}}}} (Current behavior)
  • {DP3: {DP2: {DP1: {DP3: {}}}}}

Versions

master on pytorch main on torchdata

ejguan avatar Jun 22 '22 17:06 ejguan

In the graph above, DP1 depends on DP3, why would that be the case? DP3 modifying some state that DP1 depends on? Is there an example of an use case of this situation?

NivekT avatar Jun 22 '22 19:06 NivekT

@NivekT This custom DataPipe has this kind of dependency hierarchy. https://github.com/pytorch/pytorch/blob/581e846d9def4af9682ba84b215a0545410a551b/test/test_datapipe.py#L2184-L2204

ejguan avatar Jun 22 '22 19:06 ejguan

Should we try to detect such loops during DP graph construction

VitalyFedyunin avatar Jul 06 '22 19:07 VitalyFedyunin

Should we try to detect such loops during DP graph construction

Do you mean raising an Error if there is a loop? But, we added this test because TorchVision team wants such behavior explicitly

Like the cifar dataset, it's a IterDataPipe but a few functions like this, this and this are attached to this class. Then, when the DataPipe graph is constructed, those functions are forwarded to dependent DataPipes https://github.com/pytorch/vision/blob/e75a333782bb5d5ffdf5355e766eb5937fc6697c/torchvision/prototype/datasets/_builtin/cifar.py#L80-L81

This causes the circular references.

ejguan avatar Jul 08 '22 21:07 ejguan

Closing as the decision has been made that traverse_dps will stop when it has seen the datapipe before

ejguan avatar Dec 13 '22 15:12 ejguan