fastp icon indicating copy to clipboard operation
fastp copied to clipboard

memory steadily increases as run continues

Open wilsonte-umich opened this issue 3 years ago • 14 comments

There were other issues about memory, but none resolved/addressed what I am seeing. I run fastp 0.23.2 in a stream as follows.

fastp \
--stdin --interleaved_in --stdout \
--dont_eval_duplication \
--length_required 25 \
--merge --include_unmerged --correction \
--html mySample.html --json mySample.json \
--report_title mySample 2>/dev/null |

Memory usage continually climbs as long as the job runs. On large data sets I am getting OOM job kills on my cluster. The job below went from RES 4.9g to 5.3g as I typed this message... (%CPU is low at the moment I captured this because the downstream task is busy - fastp is, indeed, much faster than the aligner!).

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                                                      
 89783 wilsonte  20   0 5306760   4.9g   1828 S   2.0  2.6  49:47.81 fastp

Is this expected behavior? Memory leak? User error?

wilsonte-umich avatar Feb 14 '22 21:02 wilsonte-umich

I think I am having the same issue. Were you able to resolve this problem?

aminards avatar Mar 21 '22 16:03 aminards

No - I have no true resolution yet. I am just letting the memory accumulate and giving the jobs enough resources that they don't get killed by our server rules. I'd love to have it addressed properly as this approach might not scale to all data sets...

wilsonte-umich avatar Mar 21 '22 17:03 wilsonte-umich

Any updates on this? My run needs >128 GB memory for 2 25 GB paired-end fastq.gz files.

bbremer avatar Jan 04 '23 14:01 bbremer

Which version of fastp did you use? Could you please upload your data, and paste your command here?

sfchen avatar Jan 05 '23 00:01 sfchen

I am unclear if you were asking me or the others who posted on this issue - I included my version and command in my original post above. Thanks.

wilsonte-umich avatar Jan 08 '23 16:01 wilsonte-umich

I'm on version 0.23.2. I'm calling fastp as a Python subprocess, equivalent to calling this from Bash:

fastp -w 8 -i in1.fastq.gz -i in2.fastq.gz -o \
out1.fastq.gz -O out2.fastq.gz -h report.html -j report.json \
--cut_tail -A -m --merged_out merged_out.fastq.gz &> log.txt

Adding --dont_eval_duplication does not seem to reduce memory usage.

bbremer avatar Jan 12 '23 16:01 bbremer

I'm having this same issue, could it have to do with the html or json reports?

Last lines of the log:

[10:30:13] cleaned.R2.fq.gz writer finished
[10:30:15] cleaned.R1.fq.gz writer finished
[10:30:17] start to generate reports

slurmstepd: error: Detected 1 oom-kill event(s) in StepId=1251474.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

philippbayer avatar Apr 06 '23 10:04 philippbayer

@sfchen - any chance this memory accumulation issue will be attended to? The release notes for v0.23.3 and 0.23.4 do not appear to address it.

@philippbayer queried if it might be due to the reporting, might that be true? (I am writing both html and json reports)

Thanks for any help.

wilsonte-umich avatar Jul 12 '23 13:07 wilsonte-umich

Had the same issue with both 0.23.2 and 0.23.4 when processing Nanopore reads data. It consumed more than 128GB memory for 100Mb reads. --dont_eval_duplication does not help. --disable_adapter_trimming helped.

dawnmy avatar Oct 04 '23 11:10 dawnmy