tsinfer
tsinfer copied to clipboard
Root node in real-data inferred tree sequence has huge number of children
The root node of the tgp tree sequence has hundreds of thousands of child edges, a large proportion of which (>50%) are sample edges. This can cause issues with tsdate. Would you say this is expected @jeromekelleher? Perhaps sample nodes go to the root when there's no inferred ancestor proves to be a good match?
I should say that this node is not the root everywhere, but is the root for a substantial portion (perhaps the majority) of the chromosome.
I think it's an artefact of the current exact-matching-only approach - hopefully this will be reduced when we've tuned the new recombination/mutation rate parameters.
I think this is fixed by https://github.com/tskit-dev/tsinfer/pull/687
Can we see if this is now fixed in tsinfer 0.3 and if so, close this issue, @awohns ? Perhaps @szhan would be able to help make a new TGP tree sequence using the pipeline in e.g. the unified genealogy paper and compare the distribution of number of children per node for the root nodes?