tsinfer icon indicating copy to clipboard operation
tsinfer copied to clipboard

Better logic in remove_buffer when building ancestors, esp with missing data

Open hyanwong opened this issue 5 years ago • 2 comments

In https://github.com/tskit-dev/tsinfer/blob/62b0135d636821d9e1100c1e3d180f871724cf47/tsinfer/algorithm.py#L173 we use a remove_buffer so that we don't always change our ancestor building when we reach a conflicting site, but only if conflict continues in the next (older) site along. However, where there is missing data, it might be unclear whether or not the next (older) site is actually conflicting or not. In this case, I suspect we should not clear the buffer, but merely advance to the next site with the buffer intact.

hyanwong avatar Feb 07 '20 17:02 hyanwong

Sounds good. Any site with all missing data should be ignored I think, as a guiding principle.

jeromekelleher avatar Feb 10 '20 11:02 jeromekelleher

I think the new missing-data code (e.g. https://github.com/tskit-dev/tsinfer/blob/a27240b3e214e0ca05d88b1e3b9022e029284a34/tsinfer/algorithm.py#L180) still needs this logic implementing. It does correctly check for missing data when removing samples from the included clade, but I think extra logic is needed about when or not to clear the buffer.

hyanwong avatar Apr 01 '20 21:04 hyanwong