tsinfer
tsinfer copied to clipboard
Better logic in remove_buffer when building ancestors, esp with missing data
In https://github.com/tskit-dev/tsinfer/blob/62b0135d636821d9e1100c1e3d180f871724cf47/tsinfer/algorithm.py#L173 we use a remove_buffer so that we don't always change our ancestor building when we reach a conflicting site, but only if conflict continues in the next (older) site along. However, where there is missing data, it might be unclear whether or not the next (older) site is actually conflicting or not. In this case, I suspect we should not clear the buffer, but merely advance to the next site with the buffer intact.
Sounds good. Any site with all missing data should be ignored I think, as a guiding principle.
I think the new missing-data code (e.g. https://github.com/tskit-dev/tsinfer/blob/a27240b3e214e0ca05d88b1e3b9022e029284a34/tsinfer/algorithm.py#L180) still needs this logic implementing. It does correctly check for missing data when removing samples from the included clade, but I think extra logic is needed about when or not to clear the buffer.