msprime
msprime copied to clipboard
Add special case for merging non-overlapping segments
Our analysis of the running time of Hudson's algorithm in the msprime 1.0 paper predicts that a lot of the time spent for long genomes will be in events that merge two widely separated chunks of ancestry. It may therefore be worthwhile adding a special case in the merge_ancestors
code path to deal with this case, where given two segment chains a
and b
, we have the right-most segment of a
is < the left most segment of b
. However, we currently don't record the extremities of the segments per individual, so we would need to refactor things somewhat.
This would also only be worth doing if the number of segments in a
and b
we reasonably large, so a first step would be to do some exploratory work to see what the average number of segments in this "gather" phase of the simulation really is. If anyone is interested in making their Drosophila simulations go faster, then this would be a good place to start.