tsinfer icon indicating copy to clipboard operation
tsinfer copied to clipboard

Improve ancestor fetching

Open benjeffery opened this issue 1 year ago • 0 comments

Since #828 was merged we no longer access ancestors in order when matching, and create a seperate chunk_iterator for each ancestor grouping. For large datasets on high-latency filesystems we are now spending more time reading ancestors than matching them! This could be fixed by some sort of chunk cache that the iterator uses, along with caching the other, non-genotype arrays that are currently read for every chunk_iterator.

benjeffery avatar Jun 14 '23 08:06 benjeffery