deepsnap
deepsnap copied to clipboard
Default 'remove_self_loops' set to True in heterogeneous link prediction example problematic for use case with multiple node types?
Hello,
I am working on setting up link prediction on a heterogeneous graph with different node types and used the examples in deepsnap/examples/link_prediction_hetero
as a starting point. In these examples, the HeteroSAGEConv
layer is used, with its default of setting 'remove_self_loops' to true.
Could this be problematic when I have two different node types, as the edge index for the links between these could have links between, for example, node 0 of one node type and node 0 of the other node type? In fact, this is what happened for me.
As far as I understood the negative sampling algorithm in the hetero_graph/negative_sampling()
method expects the indices for both node types to start from zero, rather than have unique indices, so the solution can't be to create unique node indices across the different node types?
If I am correct, it may be worth adding a comment on the need to remove self-loops in careful pre-processing instead of using the default functionality in the HeteroSAGEConv layer? Another option could be to add a check if the source and destination node type are the same in the HeteroSAGEConv implementation, before running pyg_utils.remove_self_loops(edge_index)? I guess it doesn't really make sense to remove self loops in the case where the source and destination node type is different anyway?
If I have misunderstood the situation, I would very much appreciate to be corrected!
Thank you!
Hi,
Yes, the current remove_self_loops=True
will be problematic when the node types are different. Thanks for pointing out this problem! I will update the HeteroSAGEConv
layer recently.
Thank you, I’m glad it is useful! I appreciate that the impact in reality will likely be small but it is still good to not have unintended behaviour.
Thank you for your quick reply!