eager
eager copied to clipboard
Deduplication Parallelization
We could think about splitting BAMs as DeDup/MarkDup takes quite some time normally and use a file.size>2GB (or similar operator) to speed up things significantly. A subsequent merge would be a matter of minutes, automatically creating the same output for downstream analysis as before.
I was about to suggest closing this as we aren't really promoting use of DeDup anymore other than niche cases, but I see it is also valid for MarkDuplicates so renamed.
Done: https://github.com/nf-core/eager/pull/944