Automatically generate and cache minimap2 indexes to eliminate redundant indexing overhead

Open bede opened this issue 1 year ago • 0 comments

For whatever reason, when initially implementing long read support using Minimap2, I was unable to demonstrate significantly reduced execution time versus recreating the index from scratch every time hostile clean is called. Using a prebuilt index was only marginally quicker and frankly not worth the complexity of managing indexes. However, recently I tested whether this is still the case and observed that running hostile clean on a small long read fastq drops from taking ~45s to ~7s through use of a precomputed index.

This behaviour should first be characterised / verified on Linux and MacOS. Assuming the performance benefits are replicated on both OSs, adding invisible (but suitably logged) index caching and reuse should be done unless a good reason not to do so becomes apparent.

This will dramatically reduce execution time for processing many long read samples where this redundant indexing overhead is painful.

Jun 14 '24 16:06 bede