hetmatpy icon indicating copy to clipboard operation
hetmatpy copied to clipboard

Rationale behind degree-grouping of node pairs

Open dgoncalo opened this issue 7 months ago • 1 comments

Hi,

Thank you for this project. I'm using it for my master's thesis and it's been really helpful.

If I understand the degree-grouping of node pairs correctly, the null distribution for a DWPC corresponding to a (source node, target node, metapath) triple is based on the permuted DWPCs of that triple + all permuted DWPCs of other node pairs with the same metapath and matching source and target node degrees. If that's the case, then all pairs sharing the same (source degree, target degree, metapath) will share the same null distribution for their observed DWPC. I realize this approach is meant to increase the sample size of null distributions, especially since only 200 permutations of the original graph were generated in your original paper, but I’m wondering how sound this is from a statistical standpoint. It seems odd that the null distribution for DWPCs is not uniquely tailored to each node pair. Can you please shed some light on this?

dgoncalo avatar May 22 '25 22:05 dgoncalo

the null distribution for a DWPC corresponding to a (source node, target node, metapath) triple is based on the permuted DWPCs of that triple + all permuted DWPCs of other node pairs with the same metapath and matching source and target node degrees

Correct, that is an accurate description of degree-grouping to construct the null distribution.

I’m wondering how sound this is from a statistical standpoint. It seems odd that the null distribution for DWPCs is not uniquely tailored to each node pair. Can you please shed some light on this?

The key insight as to why degree grouping is appropriate is described here:

As permutation preserves only node degree, node pairs with equal degree are equivalent in permutations.

With XSwap, degree is preserved but edges should be otherwise randomized within their type (metaedge). Therefore the type of a node and its degree should be the only truly distinguishing characteristics of a node post permutation.

Does that make sense?

dhimmel avatar May 26 '25 21:05 dhimmel