phyloFlash
phyloFlash copied to clipboard
killed while loading SAM into Memory
Hi, Im trying to run the phyloFlash.pl script on single-read Data basepair length 75.
The provided testcase to check if installation was successful is running without issuses. -env test approves set up System specs: 16GiB System Memory AMD Ryzen 5 5500U with Radeon
phyloFlash.pl -lib run01 -read1 MP1_S30_R1_001.fastq.gz -readlength 75
This is phyloFlash v3.4
[21:44:58] Using dbhome '/home/leo/138.1'
[21:44:58] working on library run01
[21:44:58] Forward reads MP1_S30_R1_001.fastq.gz
[21:44:58] Running in single ended mode
[21:44:58] Current operating system linux
[21:44:58] Checking for required tools.
[21:44:58] Using nhmmer found at
"/home/leo/anaconda3/envs/pf/lib/phyloFlash/barrnap-HGV/binaries/linux/nhmmer".
[21:44:58] Using grep found at "/usr/bin/grep".
[21:44:58] Using mafft found at "/home/leo/anaconda3/envs/pf/bin/mafft".
[21:44:58] Using barrnap found at
"/home/leo/anaconda3/envs/pf/lib/phyloFlash/barrnap-HGV/bin/barrnap_HGV".
[21:44:58] Using fastaFromBed found at
"/home/leo/anaconda3/envs/pf/bin/fastaFromBed".
[21:44:58] Using plotscript_SVG found at
"/home/leo/anaconda3/envs/pf/lib/phyloFlash/phyloFlash_plotscript_svg.pl".
[21:44:58] Using spades found at
"/home/leo/anaconda3/envs/pf/bin/spades.py".
[21:44:58] Using cat found at "/usr/bin/cat".
[21:44:58] Using sed found at "/home/leo/anaconda3/envs/pf/bin/sed".
[21:44:58] Using bbmap found at "/home/leo/anaconda3/envs/pf/bin/bbmap.sh".
[21:44:58] Using vsearch found at
"/home/leo/anaconda3/envs/pf/bin/vsearch".
[21:44:58] Using awk found at "/usr/bin/awk".
[21:44:58] Using reformat found at
"/home/leo/anaconda3/envs/pf/bin/reformat.sh".
[21:44:58] All required tools found.
[21:44:58] filtering reads with SSU db using minimum identity of 70%
[21:44:58] running subcommand:
/home/leo/anaconda3/envs/pf/bin/bbmap.sh fast=t minidentity=0.7
-Xmx10g reads=-1 threads=12 po=f outputunmapped=f
path=/home/leo/138.1 out=run01.bbmap.sam
outm=run01.MP1_S30_R1_001.fastq.gz.SSU.1.fq noheader=t
ambiguous=all build=1 in=MP1_S30_R1_001.fastq.gz
bhist=run01.basecompositionhistogram ihist=run01.inserthistogram
idhist=run01.idhistogram scafstats=run01.hitstats overwrite=t
2>run01.bbmap.out
[22:14:10] done...
[22:14:10] Reading SAM file run01.bbmap.sam into memory
Killed
dmesg shows that oom killed it.
[Sa Jan 15 22:14:40 2022] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[email protected],task=perl,pid=57464,uid=1000
[Sa Jan 15 22:14:40 2022] Out of memory: Killed process 57464 (perl) total-vm:10282140kB, anon-rss:10259012kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:20148kB oom_score_adj:0
My read file is 242,4 MB big. The SAM File which should get opened is 2.7 GB big. Is the Size of the SAM File the issue here? I found in the file phyloFlash.pl in line 1133 a function called "open_or_die" does this cause the killing?(not familiar with perl)
Does anyone have recommendations what to do ?
I could fix the issue adjusting the -id parameter to 90% My guess is that 70% id was to unspecific thus created to many hits? Does this make sense?
I think you are correct, with your high number of input reads the number of reads recruited at 70% ID is too high for emirge to handle. With 90% you are covering much of the sequence space the emirge works with, so that is a very good solution.