deepsnap Default 'remove_self_loops' set to True in heterogeneous link prediction example problematic for use case with multiple node types?

Default 'remove_self_loops' set to True in heterogeneous link prediction example problematic for use case with multiple node types?

Open anniekmyatt opened this issue 3 years ago • 2 comments

Hello, I am working on setting up link prediction on a heterogeneous graph with different node types and used the examples in deepsnap/examples/link_prediction_hetero as a starting point. In these examples, the HeteroSAGEConv layer is used, with its default of setting 'remove_self_loops' to true.

Could this be problematic when I have two different node types, as the edge index for the links between these could have links between, for example, node 0 of one node type and node 0 of the other node type? In fact, this is what happened for me.

As far as I understood the negative sampling algorithm in the hetero_graph/negative_sampling() method expects the indices for both node types to start from zero, rather than have unique indices, so the solution can't be to create unique node indices across the different node types?

If I am correct, it may be worth adding a comment on the need to remove self-loops in careful pre-processing instead of using the default functionality in the HeteroSAGEConv layer? Another option could be to add a check if the source and destination node type are the same in the HeteroSAGEConv implementation, before running pyg_utils.remove_self_loops(edge_index)? I guess it doesn't really make sense to remove self loops in the case where the source and destination node type is different anyway?

If I have misunderstood the situation, I would very much appreciate to be corrected!

Thank you!

May 28 '21 08:05 anniekmyatt

Hi,

Yes, the current remove_self_loops=True will be problematic when the node types are different. Thanks for pointing out this problem! I will update the HeteroSAGEConv layer recently.

May 31 '21 07:05 zechengz

Thank you, I’m glad it is useful! I appreciate that the impact in reality will likely be small but it is still good to not have unintended behaviour.

Thank you for your quick reply!

May 31 '21 09:05 anniekmyatt

deepsnap deepsnap copied to clipboard

Default 'remove_self_loops' set to True in heterogeneous link prediction example problematic for use case with multiple node types?

deepsnap
deepsnap copied to clipboard