dgl
dgl copied to clipboard
Cloning DGLBlock breaks underyling metagraph, making it unusable.
🐛 Bug
When a DGLBlock is cloned, the duplicate srctype and dsttype get re-merged in the metagraph, which makes it unusable. This appears to be the result of the metagraph being constructed purely from the meta-edges, losing information about src and dst types being distinct: https://github.com/dmlc/dgl/blob/master/python/dgl/heterograph.py#L5604
To Reproduce
On the output of a block from the dataloader call .clone() and attempt to print the graph:
Traceback (most recent call last):
File "node_classification.py", line 143, in <module>
train(args, device, g, dataset, model)
File "node_classification.py", line 106, in train
print(blocks[0].clone())
File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 6383, in __repr__
dstnode=self.number_of_dst_nodes(),
File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 2415, in number_of_dst_nodes
return self.num_dst_nodes(ntype)
File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 2475, in num_dst_nodes
for nty in self.dsttypes])
File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 2475, in <listcomp>
for nty in self.dsttypes])
File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph_index.py", line 337, in number_of_nodes
return _CAPI_DGLHeteroNumVertices(self, int(ntype))
File "dgl/_ffi/_cython/./function.pxi", line 293, in core.FunctionBase.__call__
File "dgl/_ffi/_cython/./function.pxi", line 225, in core.FuncCall
File "dgl/_ffi/_cython/./function.pxi", line 215, in core.FuncCall3
dgl._ffi.base.DGLError: [17:49:41] /home/dominique/src/dgl/src/graph/./heterograph.h:76: Check failed: meta_graph_->HasVertex(vtype): Invalid vertex type: 1
Stack trace:
[bt] (0) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(+0x6e0332) [0x7f232ea13332]
[bt] (1) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(dgl::HeteroGraph::NumVertices(unsigned long) const+0x9d) [0x7f232ea1bc5d]
[bt] (2) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(+0x6f127e) [0x7f232ea2427e]
[bt] (3) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(DGLFuncCall+0x73) [0x7f232e9a7c93]
[bt] (4) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/_ffi/_cy3/core.cpython-36m-x86_64-linux-gnu.so(+0x19a03) [0x7f22fc49ca03]
[bt] (5) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/_ffi/_cy3/core.cpython-36m-x86_64-linux-gnu.so(+0x1a7a7) [0x7f22fc49d7a7]
[bt] (6) python3(_PyObject_FastCallKeywords+0x19c) [0x5a8b3c]
[bt] (7) python3() [0x50a9a3]
[bt] (8) python3(_PyEval_EvalFrameDefault+0x444) [0x50c414]
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
Do you have any workaround for this? If not we will label this issue as high priority.
I don't have a workaround.
It's not symptomatic in DGL currently, but I hit it when trying to modify the dataloader to clone a DGL block before passing it to the training process.
Sorry for the late response. @nv-dlasalle can you help us understand your use case here?
We are redesigning the Dataloader, we may simplifier the DGLBlocker. If this happens, the problem will be solved automatically.
We will revisit this bug later. If this is a important use case, please let us know.