dgl Add option for preserving batch information when removing self loops.

🚀 Feature

It seems unlikely that a users calling dgl.remove_self_loop() (https://docs.dgl.ai/generated/dgl.remove_self_loop.html?highlight=remove_self_loop#dgl.remove_self_loop) would desire the side effect of batch information being removed. It can also make code confusing to read:

graph = load_batched_graphs()
batch_nodes = graph.batch_num_nodes()
graph = dgl.remove_self_loops(graph)
recalculate_batch_info(graph, batch_nodes)

I'm requesting we add a parameter preserve_batch to remove_self_loops:

def remove_self_loop(g, etype=None, preserve_batch=True)

Whether it should default to True or False would depend on what release it makes it in--i.e., are we fine changing the default behavior.

Aug 02 '22 17:08 nv-dlasalle

Just a note (for when searching in the code) that 2 of the occurrences above are remove_self_loops, but the real function name is singular, remove_self_loop.

Aug 03 '22 00:08 ndickson-nvidia

This appears to be fixed by https://github.com/dmlc/dgl/pull/3119, but:

the docs need to be updated,
and we should add a unit test to ensure batch information continues to be preserved when removing self-loops (https://github.com/dmlc/dgl/blob/master/tests/compute/test_transform.py#L1750).

Aug 04 '22 23:08 nv-dlasalle