tskit Use divergence_matrix for downstream statistics

Use divergence_matrix for downstream statistics

Open jeromekelleher opened this issue 2 years ago • 3 comments

I think we can rephrase at least genetic_relatedness (aka eGRM) in terms of divergence_matrix, which should substantially improve performance (although waiting for #2779 which is needed for decent site-mode performance).

Can we transform the divergence matrix into genetic_relatedness efficiently in Python (i.e. using numpy) or do we need C code for this @petrelharp?

Are there other stats we can do this for?

Jul 07 '23 11:07 jeromekelleher

We'd need to consider the compatibility issues raise, of course. For one, we'll be computing something slightly different in site mode after this, I guess?

Jul 07 '23 11:07 jeromekelleher

Let's see - we talked through how to do this somewhere; the missing piece is you need the function that computes, for each node, the total area from the node to the root (that's in branch mode; for site it's the number of mutations). Call this derived; then relatedness[i,j] = derived[i] + derived[j] - divergence[i,j].

HOWEVER, your point about back mutations is an important one. I think that we argued that if divergence matrix and divergence gave slightly different answers that was OK; if that is true then relatedness_matrix and relatedness could also give slightly different answers?

Jul 09 '23 04:07 petrelharp

Ah yes, that makes sense. Given we need to compute derived per window it's probably simpler to do in c rather than try to come up with numpy tricks.

So, we create a C function genetic_relatedness_matrix, following the pattern of divergence_matrix, and expose this to python in the standard way?

I think having the *_matrix functions have slightly different semantics is fine, we just need to document it clearly

Jul 09 '23 08:07 jeromekelleher

This was done in #2823 and see #1623 for documentation.

Sep 25 '24 04:09 petrelharp

tskit tskit copied to clipboard

Use divergence_matrix for downstream statistics

tskit
tskit copied to clipboard