dgl icon indicating copy to clipboard operation
dgl copied to clipboard

Add option for preserving batch information when removing self loops.

Open nv-dlasalle opened this issue 3 years ago • 2 comments

🚀 Feature

It seems unlikely that a users calling dgl.remove_self_loop() (https://docs.dgl.ai/generated/dgl.remove_self_loop.html?highlight=remove_self_loop#dgl.remove_self_loop) would desire the side effect of batch information being removed. It can also make code confusing to read:

graph = load_batched_graphs()
batch_nodes = graph.batch_num_nodes()
graph = dgl.remove_self_loops(graph)
recalculate_batch_info(graph, batch_nodes)

I'm requesting we add a parameter preserve_batch to remove_self_loops:

def remove_self_loop(g, etype=None, preserve_batch=True)

Whether it should default to True or False would depend on what release it makes it in--i.e., are we fine changing the default behavior.

nv-dlasalle avatar Aug 02 '22 17:08 nv-dlasalle

Just a note (for when searching in the code) that 2 of the occurrences above are remove_self_loops, but the real function name is singular, remove_self_loop.

ndickson-nvidia avatar Aug 03 '22 00:08 ndickson-nvidia

This appears to be fixed by https://github.com/dmlc/dgl/pull/3119, but:

  1. the docs need to be updated,
  2. and we should add a unit test to ensure batch information continues to be preserved when removing self-loops (https://github.com/dmlc/dgl/blob/master/tests/compute/test_transform.py#L1750).

nv-dlasalle avatar Aug 04 '22 23:08 nv-dlasalle