fastp
fastp copied to clipboard
memory steadily increases as run continues
There were other issues about memory, but none resolved/addressed what I am seeing. I run fastp 0.23.2 in a stream as follows.
fastp \
--stdin --interleaved_in --stdout \
--dont_eval_duplication \
--length_required 25 \
--merge --include_unmerged --correction \
--html mySample.html --json mySample.json \
--report_title mySample 2>/dev/null |
Memory usage continually climbs as long as the job runs. On large data sets I am getting OOM job kills on my cluster. The job below went from RES 4.9g to 5.3g as I typed this message... (%CPU is low at the moment I captured this because the downstream task is busy - fastp is, indeed, much faster than the aligner!).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
89783 wilsonte 20 0 5306760 4.9g 1828 S 2.0 2.6 49:47.81 fastp
Is this expected behavior? Memory leak? User error?
I think I am having the same issue. Were you able to resolve this problem?
No - I have no true resolution yet. I am just letting the memory accumulate and giving the jobs enough resources that they don't get killed by our server rules. I'd love to have it addressed properly as this approach might not scale to all data sets...
Any updates on this? My run needs >128 GB memory for 2 25 GB paired-end fastq.gz files.
Which version of fastp did you use? Could you please upload your data, and paste your command here?
I am unclear if you were asking me or the others who posted on this issue - I included my version and command in my original post above. Thanks.
I'm on version 0.23.2. I'm calling fastp as a Python subprocess, equivalent to calling this from Bash:
fastp -w 8 -i in1.fastq.gz -i in2.fastq.gz -o \
out1.fastq.gz -O out2.fastq.gz -h report.html -j report.json \
--cut_tail -A -m --merged_out merged_out.fastq.gz &> log.txt
Adding --dont_eval_duplication
does not seem to reduce memory usage.
I'm having this same issue, could it have to do with the html or json reports?
Last lines of the log:
[10:30:13] cleaned.R2.fq.gz writer finished
[10:30:15] cleaned.R1.fq.gz writer finished
[10:30:17] start to generate reports
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=1251474.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
@sfchen - any chance this memory accumulation issue will be attended to? The release notes for v0.23.3 and 0.23.4 do not appear to address it.
@philippbayer queried if it might be due to the reporting, might that be true? (I am writing both html and json reports)
Thanks for any help.
Had the same issue with both 0.23.2 and 0.23.4 when processing Nanopore reads data. It consumed more than 128GB memory for 100Mb reads. --dont_eval_duplication
does not help. --disable_adapter_trimming
helped.