fastp
fastp copied to clipboard
adapter detection gets stuck and uses all memory on ONT data
Hello,
I'm using fastp in a pipeline and found that some samples were giving out of memory errors when going through fastp. I've attached a small example: small.fq.gz
I'm using 0.24.1 and I just run:
fastp -i small.fq.gz -o clean.fastq.gz
It displays:
Detecting adapter sequence for read1...
but gets stuck whilst increasingly using more ram (100G+)
There are some very short reads so I tried filtering any <100bp long first, but that doesn't fix the issue.
If I disable adapter trimming then the sample does run completely fine.
Thanks, Let me know if you need any more info
I've been running into some memory issues as well (albeit on short read sequencing data) and took a quick look at the adapter trimming logic using your test file. I encountered the same spike in memory usage, and it looks like its occurring in the Evaluator::getAdapterWithSeed function, where some FASTQ entries are having dozens or hundreds of candidate adapter hits, which are all being stored and used to determine the actual adapter sequence.
I will fix this soon