metAMOS
metAMOS copied to clipboard
FindRepeats error
Hi everyone, I am trialling MetAMOS on metagenomics data and getting some errors at the FindRepeats step. I first performed a trial run with a sample of the my fastq file (1/10th of the reads) and the pipeline went through fine till Postprocessing. But it seems to get stuck at the FindRepeats step when I provide the complete fastqfile.
This is the pbs script I ran on our server (on both the partial and complete fastq file):
#!/bin/bash #PBS -N run_Pip_step3 #PBS -r y #PBS -A Account-Name #PBS -M Email address #PBS -l select=1:ncpus=8:mem=65gb:NodeType=large #PBS -l walltime=300:00:00 cd $PBS_O_WORKDIR module load metamos module load samtools if [ ! -d metamos_test_output11 ]; then initPipeline -q -1 Trimmed_Sponge1_1_paired.fastq -2 Trimmed_Sponge1_2_paired.fastq -d metamos_test_output11 -i 50:150 fi runPipeline -g fraggenescan -t -p 8 -d metamos_test_output11
The log file for the the successful run on the partial file was:
########################### Execution Started ############################# JobId:707091.paroo3 UserName:xxxxxx GroupName:xx ExecutionHost:b10a08 WorkingDir:/var/spool/PBS/mom_priv ############################################################################### rm: cannot remove `/ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output6/Logs/*.started': No such file or directory Job = [[Trimmed_Sponge1_1_paired_head.fastq, Trimmed_Sponge1_2_paired_head.fastq] -> pre process.success] completed Completed Task = preprocess.Preprocess Job = [preprocess.success -> [assemble.ok]] completed Completed Task = assemble.Assemble Job = [proba.asm.contig -> mapreads.success] completed Completed Task = mapreads.MapReads Job = [proba.asm.contig -> proba.faa] completed Completed Task = findorfs.FindORFS Job = [proba.fna -> proba.repeats] completed Completed Task = findreps.FindRepeats Job = [proba.faa -> proba.hits] completed Completed Task = annotate.Annotate Job = [proba.faa -> [blast.out, krona.ec.input]] completed Completed Task = fannotate.FunctionalAnnotation Job = [[proba.asm.contig] -> proba.scaffolds.final] completed Completed Task = scaffold.Scaffold Job = [proba.linearize.scaffolds.final -> proba.scaffolds.orfs] completed Completed Task = findscforfs.FindScaffoldORFS Job = [proba.annots -> propagate.ok] completed Completed Task = propagate.Propagate Job = [proba.asm.contig -> proba.taxprof.pct.txt] completed Completed Task = abundance.Abundance Job = [proba.clusters -> sorted.txt] completed Completed Task = classify.Classify Job = [proba.asm.contig -> proba.scf.fa] completed Completed Task = postprocess.Postprocess ########################### Job Execution History ############################# JobId:707091.paroo3 UserName:XXXXXXX GroupName:XXXXX JobName:run_Pip_step3 SessionId:30859 ResourcesRequested:mem=65gb,ncpus=8,place=free,walltime=300:00:00 ResourcesUsed:cpupercent=777,cput=30:29:44,mem=4754168kb,ncpus=8,vmem=52210360kb,walltime=08: 33:17 QueueUsed:workq AccountString:XXXXXXX ExitStatus:0 ###############################################################################
The log file for the run that failed using the complete fastqfile was:
########################### Execution Started ############################# JobId:710566.paroo3 UserName:xxxxx GroupName:xx ExecutionHost:b10b11 WorkingDir:/var/spool/PBS/mom_priv ############################################################################### rm: cannot remove `/ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_outpu t9/Logs/*.started': No such file or directory Job = [[Trimmed_Sponge1_1_paired.fastq, Trimmed_Sponge1_2_paired.fastq] -> preprocess.su ccess] completed Completed Task = preprocess.Preprocess Job = [preprocess.success -> [assemble.ok]] completed Completed Task = assemble.Assemble Job = [proba.asm.contig -> mapreads.success] completed Completed Task = mapreads.MapReads Job = [proba.asm.contig -> proba.faa] completed Completed Task = findorfs.FindORFS rm: cannot remove `/ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_outpu t9/Logs/findrepeats.ok': No such file or directory ruffus.ruffus_exceptions.RethrownJobError: Exception #1 'exceptions.NameError(global name 'JobSignalledBreak' is not defined)' raised in ... Task = def findreps.FindRepeats(...): Job = [proba.fna -> proba.repeats] Traceback (most recent call last): File "/sw/metAMOS/1.1/bin/Utilities/ruffus/task.py", line 616, in run_pooled_job_withou t_exceptions return_value = job_wrapper(param, user_defined_work_func, register_cleanup, touch_fi les_only) File "/sw/metAMOS/1.1/bin/Utilities/ruffus/task.py", line 486, in job_wrapper_io_files ret_val = user_defined_work_func(*param) File "/sw/metAMOS/1.1/bin/src/findreps.py", line 112, in FindRepeats getContigRepeats("%s/FindRepeats/in/%s.fna"%(_settings.rundir,_settings.PREFIX), "%s/ FindRepeats/out/%s.repeats"%(_settings.rundir,_settings.PREFIX)) File "/sw/metAMOS/1.1/bin/src/findreps.py", line 42, in getContigRepeats run_process(_settings, "%s --minreplen=200 --z=17 --sequence=%s.merged --xmfa=%s.xmfa "%(_settings.REPEATOIRE,contigFile,contigFile),"FindRepeats") File "/sw/metAMOS/1.1/bin/src/utils.py", line 608, in run_process raise (JobSignalledBreak) NameError: global name 'JobSignalledBreak' is not defined ########################### Job Execution History ############################# JobId:710566.paroo3 UserName:xxxx GroupName:xx JobName:run_Pip_step3 SessionId:22488 ResourcesRequested:mem=65gb,ncpus=8,place=free,walltime=300:00:00 ResourcesUsed:cpupercent=771,cput=08:12:56,mem=41168kb,ncpus=8,vmem=63878652kb,walltime=05:04 :45 QueueUsed:workq AccountString:xxxxxx ExitStatus:1 ###############################################################################
The FINDREPEATS.log is as follows:
setting minimum multiplicity to 2. setting maximimum multiplicity to 500. seed weight set to 17. Sequence loaded successfully. /ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output9/FindRepeats/in/p roba.fna.merged 1144236309 base pairs. Creating sorted mer list Create time was: 332 seconds. Using seed weight: 17 and w: 51 Total number of seed matches found: 71263068 terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc
I reran a separate job asking for a whole node and also adding the line "ulimit -s unlimited" in the pbs script, but it also stalled at the FindRepeats step.
I also checked the content of the in and out folders under FindRepeat:
ls -alht FindRepeats/in total 5.7G -rw------- 1 xxx xx 4.6G 2013-10-18 07:32 proba.fna.merged.sml drwx------ 2 xxx xx 88 2013-10-18 07:26 . -rw------- 1 xxx xx 1.1G 2013-10-18 07:26 proba.fna.merged lrwxrwxrwx 1 xxx xx 99 2013-10-18 07:26 proba.fna -> /ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output9/FindORFS/out/proba.fna lrwxrwxrwx 1 xxx xx 99 2013-10-18 07:26 proba.faa -> /ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output9/FindORFS/out/proba.faa drwx------ 4 xxx xx 25 2013-10-18 02:47 ..
ls -alht FindRepetas/out total 0 -rw------- 1 xxx xx 0 2013-10-18 07:26 proba.repeats
Looks to me like the infiles are fine but nothing gets added to the output folder.
Finally I've also tried to run the job giving it more memory and cpus:
#!/bin/bash #PBS -N run_Pip_step_xl #PBS -r y #PBS -A xxxx #PBS -M xxxxx #PBS -l select=1:ncpus=16:mem=230gb:NodeType=xl #PBS -l walltime=300:00:00 cd $PBS_O_WORKDIR module load metamos module load samtools if [ ! -d metamos_test_output10 ]; then initPipeline -q -1 Trimmed_Sponge2_1_paired.fastq -2 Trimmed_Sponge2_2_paired.fastq -d metamos_test_output10 -i 50:150 fi runPipeline -g fraggenescan -t -p 16 -d metamos_test_output10
This is still running but again has stalled at the FindRepeats step for the last 12 hours. and the proba.repetas file is still empty
Any tips as to what I might be doing wrong?
Kind regards