Filtlong icon indicating copy to clipboard operation
Filtlong copied to clipboard

save kmer hash for re-use?

Open rwhetten opened this issue 4 years ago • 0 comments

I'm working with a large dataset of multiple files of PacBio CLR reads, and at the moment filtlong is running on a combined file containing all the data (267 Gb). It seems it might be faster if there were an option to build and save a kmer hash of the Illumina reads used for QC so that the same hash could be used by multiple independent processes running on individual files. If fast read access to the hash is important, it could be copied to local scratch space on each individual node, so each process has its own copy.

rwhetten avatar Jul 29 '21 17:07 rwhetten