Rationale behind degree-grouping of node pairs
Hi,
Thank you for this project. I'm using it for my master's thesis and it's been really helpful.
If I understand the degree-grouping of node pairs correctly, the null distribution for a DWPC corresponding to a (source node, target node, metapath) triple is based on the permuted DWPCs of that triple + all permuted DWPCs of other node pairs with the same metapath and matching source and target node degrees. If that's the case, then all pairs sharing the same (source degree, target degree, metapath) will share the same null distribution for their observed DWPC. I realize this approach is meant to increase the sample size of null distributions, especially since only 200 permutations of the original graph were generated in your original paper, but I’m wondering how sound this is from a statistical standpoint. It seems odd that the null distribution for DWPCs is not uniquely tailored to each node pair. Can you please shed some light on this?
the null distribution for a DWPC corresponding to a (source node, target node, metapath) triple is based on the permuted DWPCs of that triple + all permuted DWPCs of other node pairs with the same metapath and matching source and target node degrees
Correct, that is an accurate description of degree-grouping to construct the null distribution.
I’m wondering how sound this is from a statistical standpoint. It seems odd that the null distribution for DWPCs is not uniquely tailored to each node pair. Can you please shed some light on this?
The key insight as to why degree grouping is appropriate is described here:
As permutation preserves only node degree, node pairs with equal degree are equivalent in permutations.
With XSwap, degree is preserved but edges should be otherwise randomized within their type (metaedge). Therefore the type of a node and its degree should be the only truly distinguishing characteristics of a node post permutation.
Does that make sense?