tskit icon indicating copy to clipboard operation
tskit copied to clipboard

extend edges misses some extendable edges

Open petrelharp opened this issue 1 year ago • 0 comments

Say we have a true situation like this:

   p    |    p    |    p    | 
   |    |    |    |    |    | 
   a    |    a    |    a    | 
 /   \  |  /      |  /      | 
b    c1 | b       | b       | 
|       | |       | | \     | 
c2      | c2      | c2 c3   |    

Currently, we can't extend through this: extend_edges will get to here then get stuck:

   p    |    p    |    p    | 
   |    |    |    |    |    | 
   a    |    a    |    /    | 
 /   \  |  /      |  /      | 
/    c1 | /       | b       | 
|       | |       | | \     | 
c2      | c2      | c2 c3   |    

... and @nspope has shown that this sort of thing is a fairly common situation in even moderate simulated tree sequences.

Brainstorming with @nspope and @hfr1tz3, we think the principle that can be used here to extend properly is that if there is a chain p -> x -> c somewhere in the tree sequence, then as long as c inherits from p, then x should be intermediate. The argument is that otherwise, c would have had to have inherited two blocks from p along disinct paths, thus requiring an extra (invisible) coalescence somewhere. (Which certainly happens sometimes, but is less parsimonious.)

The next step is to verify how serious this is and assess how hard it is to fix.

petrelharp avatar Jan 09 '24 17:01 petrelharp