dgl icon indicating copy to clipboard operation
dgl copied to clipboard

problem in batching wrt schemes and subgraphs

Open fantes opened this issue 1 year ago • 3 comments

🐛 Bug

When batching graphs, schemes can be incorrectly checked, in the case were the scheme is present but the number of edge with this scheme is 0

To Reproduce

  1. create two simple hetero graphs g1 = dgl.heterograph({("n", "r1", "n"): ([0],[1]), ("n", "r2", "n"):((),())}) g2 = dgl.heterograph({("n", "r1", "n"): ([0],[1]), ("n", "r2", "n"):((),())})

  2. add a new edge to g2, with some data g2.add_edges([0],[2], etype="r2", data = { "r": torch.tensor([1])})

  3. extract subgraphs , removing completely edge type "r2" g3 = dgl.node_subgraph(g1,[0,1]) g4 = dgl.node_subgraph(g2,[0,1])

  4. we have that : g3.edge_attr_schemes(etype="r2") gives {'_ID': Scheme(shape=(), dtype=torch.int64)} g3.num_edges(etype="r2") gives 0 g4.num_edges(etype="r2") also gives 0 (as node 2 was removed, so was the only edge of type "r2") BUT g4.edge_attr_schemes(etype="r2") gives {'r': Scheme(shape=(), dtype=torch.int64), '_ID': Scheme(shape=(), dtype=torch.int64)}

  5. so, when trying to batch this stuff: b = dgl.batch([g3,g4]) we get :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/dgl/batch.py", line 217, in batch
    ret_feat = _batch_feat_dicts(
  File "/usr/local/lib/python3.10/dist-packages/dgl/batch.py", line 247, in _batch_feat_dicts
    utils.check_all_same_schema(schemas, feat_dict_name)
  File "/usr/local/lib/python3.10/dist-packages/dgl/utils/checks.py", line 207, in check_all_same_schema
    raise DGLError(
          dgl._ffi.base.DGLError: Expect all graphs to have the same schema on edges[('n', 'r2', 'n')].data, but graph 1 got
	{'r': Scheme(shape=(), dtype=torch.int64), '_ID': Scheme(shape=(), dtype=torch.int64)}
which is different from
	{'_ID': Scheme(shape=(), dtype=torch.int64)}.

Expected behavior

correct batching, either by removing scheme when number of a given edge type becomes 0, or any other mean

Environment

  • DGL Version (e.g., 1.0): 1.1.2
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): pytorch 2.1.0
  • OS (e.g., Linux): Linux
  • How you installed DGL (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.10
  • CUDA/cuDNN version (if applicable): 11.7
  • GPU models and configuration (e.g. V100):
  • Any other relevant information:

fantes avatar Dec 04 '23 15:12 fantes

Yes, it's a current limitation that mutating a batched graph structure will corrupt its batching information. Could you try mutating graphs before batching?

jermainewang avatar Dec 13 '23 18:12 jermainewang

Hi. I am not sure to understand. In my example batching is done as a last step, could you explain to me what means "mutating graphs before batching" ? Thanks a lot

fantes avatar Dec 13 '23 18:12 fantes

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Jan 13 '24 01:01 github-actions[bot]