[Discussion] Expected graph when `traverse` a DataPipe with circular references
🐛 Describe the bug
I am trying to understand what would be the expected result when we traverse a DataPipe graph containing circular references.
Assume we have the following reference map:
graph TD;
DP1-->DP2;
DP2-->DP3;
DP3-->DP1;
DP3-->output;
When we traverse DP3, what would be expected result?
{DP3: {DP2: {DP1: {}}}}(Current behavior){DP3: {DP2: {DP1: {DP3: {}}}}}
Versions
master on pytorch main on torchdata
In the graph above, DP1 depends on DP3, why would that be the case? DP3 modifying some state that DP1 depends on? Is there an example of an use case of this situation?
@NivekT This custom DataPipe has this kind of dependency hierarchy. https://github.com/pytorch/pytorch/blob/581e846d9def4af9682ba84b215a0545410a551b/test/test_datapipe.py#L2184-L2204
Should we try to detect such loops during DP graph construction
Should we try to detect such loops during DP graph construction
Do you mean raising an Error if there is a loop? But, we added this test because TorchVision team wants such behavior explicitly
Like the cifar dataset, it's a IterDataPipe but a few functions like this, this and this are attached to this class. Then, when the DataPipe graph is constructed, those functions are forwarded to dependent DataPipes
https://github.com/pytorch/vision/blob/e75a333782bb5d5ffdf5355e766eb5937fc6697c/torchvision/prototype/datasets/_builtin/cifar.py#L80-L81
This causes the circular references.
Closing as the decision has been made that traverse_dps will stop when it has seen the datapipe before