dgl icon indicating copy to clipboard operation
dgl copied to clipboard

Cloning DGLBlock breaks underyling metagraph, making it unusable.

Open nv-dlasalle opened this issue 2 years ago • 3 comments

🐛 Bug

When a DGLBlock is cloned, the duplicate srctype and dsttype get re-merged in the metagraph, which makes it unusable. This appears to be the result of the metagraph being constructed purely from the meta-edges, losing information about src and dst types being distinct: https://github.com/dmlc/dgl/blob/master/python/dgl/heterograph.py#L5604

To Reproduce

On the output of a block from the dataloader call .clone() and attempt to print the graph:

Traceback (most recent call last):
  File "node_classification.py", line 143, in <module>
    train(args, device, g, dataset, model)
  File "node_classification.py", line 106, in train
    print(blocks[0].clone())
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 6383, in __repr__
    dstnode=self.number_of_dst_nodes(),
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 2415, in number_of_dst_nodes
    return self.num_dst_nodes(ntype)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 2475, in num_dst_nodes
    for nty in self.dsttypes])
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph.py", line 2475, in <listcomp>
    for nty in self.dsttypes])
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/heterograph_index.py", line 337, in number_of_nodes
    return _CAPI_DGLHeteroNumVertices(self, int(ntype))
  File "dgl/_ffi/_cython/./function.pxi", line 293, in core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 225, in core.FuncCall
  File "dgl/_ffi/_cython/./function.pxi", line 215, in core.FuncCall3
dgl._ffi.base.DGLError: [17:49:41] /home/dominique/src/dgl/src/graph/./heterograph.h:76: Check failed: meta_graph_->HasVertex(vtype): Invalid vertex type: 1
Stack trace:
  [bt] (0) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(+0x6e0332) [0x7f232ea13332]
  [bt] (1) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(dgl::HeteroGraph::NumVertices(unsigned long) const+0x9d) [0x7f232ea1bc5d]
  [bt] (2) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(+0x6f127e) [0x7f232ea2427e]
  [bt] (3) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/libdgl.so(DGLFuncCall+0x73) [0x7f232e9a7c93]
  [bt] (4) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/_ffi/_cy3/core.cpython-36m-x86_64-linux-gnu.so(+0x19a03) [0x7f22fc49ca03]
  [bt] (5) /home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/_ffi/_cy3/core.cpython-36m-x86_64-linux-gnu.so(+0x1a7a7) [0x7f22fc49d7a7]
  [bt] (6) python3(_PyObject_FastCallKeywords+0x19c) [0x5a8b3c]
  [bt] (7) python3() [0x50a9a3]
  [bt] (8) python3(_PyEval_EvalFrameDefault+0x444) [0x50c414]

nv-dlasalle avatar Aug 16 '22 01:08 nv-dlasalle

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Sep 22 '22 01:09 github-actions[bot]

Do you have any workaround for this? If not we will label this issue as high priority.

BarclayII avatar Sep 26 '22 06:09 BarclayII

I don't have a workaround.

It's not symptomatic in DGL currently, but I hit it when trying to modify the dataloader to clone a DGL block before passing it to the training process.

nv-dlasalle avatar Sep 26 '22 20:09 nv-dlasalle

Sorry for the late response. @nv-dlasalle can you help us understand your use case here?

We are redesigning the Dataloader, we may simplifier the DGLBlocker. If this happens, the problem will be solved automatically.

We will revisit this bug later. If this is a important use case, please let us know.

frozenbugs avatar Mar 15 '23 02:03 frozenbugs