metAMOS icon indicating copy to clipboard operation
metAMOS copied to clipboard

FindRepeats error

Open abracarambar opened this issue 11 years ago • 0 comments

Hi everyone, I am trialling MetAMOS on metagenomics data and getting some errors at the FindRepeats step. I first performed a trial run with a sample of the my fastq file (1/10th of the reads) and the pipeline went through fine till Postprocessing. But it seems to get stuck at the FindRepeats step when I provide the complete fastqfile.

This is the pbs script I ran on our server (on both the partial and complete fastq file):

#!/bin/bash
#PBS -N run_Pip_step3
#PBS -r y
#PBS -A Account-Name
#PBS -M Email address
#PBS -l select=1:ncpus=8:mem=65gb:NodeType=large
#PBS -l walltime=300:00:00

cd $PBS_O_WORKDIR

module load metamos
module load samtools

if [ ! -d metamos_test_output11 ]; then
initPipeline -q -1 Trimmed_Sponge1_1_paired.fastq -2 Trimmed_Sponge1_2_paired.fastq  -d metamos_test_output11 -i 50:150
fi
runPipeline  -g fraggenescan -t  -p 8  -d metamos_test_output11

The log file for the the successful run on the partial file was:

########################### Execution Started #############################
JobId:707091.paroo3
UserName:xxxxxx
GroupName:xx
ExecutionHost:b10a08
WorkingDir:/var/spool/PBS/mom_priv
###############################################################################
rm: cannot remove `/ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output6/Logs/*.started': No such file or directory
    Job  = [[Trimmed_Sponge1_1_paired_head.fastq, Trimmed_Sponge1_2_paired_head.fastq] -> pre
process.success] completed
Completed Task = preprocess.Preprocess
    Job  = [preprocess.success -> [assemble.ok]] completed
Completed Task = assemble.Assemble
    Job  = [proba.asm.contig -> mapreads.success] completed
Completed Task = mapreads.MapReads
    Job  = [proba.asm.contig -> proba.faa] completed
Completed Task = findorfs.FindORFS
    Job  = [proba.fna -> proba.repeats] completed
Completed Task = findreps.FindRepeats
    Job  = [proba.faa -> proba.hits] completed
Completed Task = annotate.Annotate
    Job  = [proba.faa -> [blast.out, krona.ec.input]] completed
Completed Task = fannotate.FunctionalAnnotation
    Job  = [[proba.asm.contig] -> proba.scaffolds.final] completed
Completed Task = scaffold.Scaffold
    Job  = [proba.linearize.scaffolds.final -> proba.scaffolds.orfs] completed
Completed Task = findscforfs.FindScaffoldORFS
    Job  = [proba.annots -> propagate.ok] completed
Completed Task = propagate.Propagate
    Job  = [proba.asm.contig -> proba.taxprof.pct.txt] completed
Completed Task = abundance.Abundance
    Job  = [proba.clusters -> sorted.txt] completed
Completed Task = classify.Classify
    Job  = [proba.asm.contig -> proba.scf.fa] completed
Completed Task = postprocess.Postprocess
########################### Job Execution History #############################
JobId:707091.paroo3
UserName:XXXXXXX
GroupName:XXXXX
JobName:run_Pip_step3
SessionId:30859
ResourcesRequested:mem=65gb,ncpus=8,place=free,walltime=300:00:00
ResourcesUsed:cpupercent=777,cput=30:29:44,mem=4754168kb,ncpus=8,vmem=52210360kb,walltime=08:
33:17
QueueUsed:workq
AccountString:XXXXXXX
ExitStatus:0
###############################################################################

The log file for the run that failed using the complete fastqfile was:

########################### Execution Started #############################
JobId:710566.paroo3
UserName:xxxxx
GroupName:xx
ExecutionHost:b10b11
WorkingDir:/var/spool/PBS/mom_priv
###############################################################################
rm: cannot remove `/ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_outpu
t9/Logs/*.started': No such file or directory
    Job  = [[Trimmed_Sponge1_1_paired.fastq, Trimmed_Sponge1_2_paired.fastq] -> preprocess.su
ccess] completed
Completed Task = preprocess.Preprocess
    Job  = [preprocess.success -> [assemble.ok]] completed
Completed Task = assemble.Assemble
    Job  = [proba.asm.contig -> mapreads.success] completed
Completed Task = mapreads.MapReads
    Job  = [proba.asm.contig -> proba.faa] completed
Completed Task = findorfs.FindORFS
rm: cannot remove `/ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_outpu
t9/Logs/findrepeats.ok': No such file or directory
ruffus.ruffus_exceptions.RethrownJobError: 
    
    
    
    Exception #1
      'exceptions.NameError(global name 'JobSignalledBreak' is not defined)' raised in ...
       Task = def findreps.FindRepeats(...):
       Job  = [proba.fna -> proba.repeats]
    
    Traceback (most recent call last):
File "/sw/metAMOS/1.1/bin/Utilities/ruffus/task.py", line 616, in run_pooled_job_withou
t_exceptions
        return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_fi
les_only)
      File "/sw/metAMOS/1.1/bin/Utilities/ruffus/task.py", line 486, in job_wrapper_io_files
        ret_val = user_defined_work_func(*param)
      File "/sw/metAMOS/1.1/bin/src/findreps.py", line 112, in FindRepeats
        getContigRepeats("%s/FindRepeats/in/%s.fna"%(_settings.rundir,_settings.PREFIX), "%s/
FindRepeats/out/%s.repeats"%(_settings.rundir,_settings.PREFIX))
      File "/sw/metAMOS/1.1/bin/src/findreps.py", line 42, in getContigRepeats
        run_process(_settings, "%s --minreplen=200 --z=17 --sequence=%s.merged --xmfa=%s.xmfa
"%(_settings.REPEATOIRE,contigFile,contigFile),"FindRepeats")
      File "/sw/metAMOS/1.1/bin/src/utils.py", line 608, in run_process
        raise (JobSignalledBreak)
    NameError: global name 'JobSignalledBreak' is not defined
    
    
########################### Job Execution History #############################
JobId:710566.paroo3
UserName:xxxx
GroupName:xx
JobName:run_Pip_step3
SessionId:22488
ResourcesRequested:mem=65gb,ncpus=8,place=free,walltime=300:00:00
ResourcesUsed:cpupercent=771,cput=08:12:56,mem=41168kb,ncpus=8,vmem=63878652kb,walltime=05:04
:45
QueueUsed:workq
AccountString:xxxxxx
ExitStatus:1
###############################################################################

The FINDREPEATS.log is as follows:

setting minimum multiplicity to 2.
setting maximimum multiplicity to 500.
seed weight set to 17.
Sequence loaded successfully.
/ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output9/FindRepeats/in/p
roba.fna.merged 1144236309 base pairs.
Creating sorted mer list
Create time was: 332 seconds.
Using seed weight: 17 and w: 51
Total number of seed matches found: 71263068
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc

I reran a separate job asking for a whole node and also adding the line "ulimit -s unlimited" in the pbs script, but it also stalled at the FindRepeats step.

I also checked the content of the in and out folders under FindRepeat:

ls -alht FindRepeats/in
total 5.7G
-rw------- 1 xxx xx 4.6G 2013-10-18 07:32 proba.fna.merged.sml
drwx------ 2 xxx xx   88 2013-10-18 07:26 .
-rw------- 1 xxx xx 1.1G 2013-10-18 07:26 proba.fna.merged
lrwxrwxrwx 1 xxx xx   99 2013-10-18 07:26 proba.fna -> /ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output9/FindORFS/out/proba.fna
lrwxrwxrwx 1 xxx xx   99 2013-10-18 07:26 proba.faa -> /ebi/bscratch/uqmgauth/sponge/SYMBIONT-SEQ/TRIMMED-DATA/metamos_test_output9/FindORFS/out/proba.faa
drwx------ 4 xxx xx   25 2013-10-18 02:47 ..
ls -alht FindRepetas/out
total 0
-rw------- 1 xxx xx  0 2013-10-18 07:26 proba.repeats

Looks to me like the infiles are fine but nothing gets added to the output folder.

Finally I've also tried to run the job giving it more memory and cpus:

#!/bin/bash
#PBS -N run_Pip_step_xl
#PBS -r y
#PBS -A xxxx
#PBS -M xxxxx
#PBS -l select=1:ncpus=16:mem=230gb:NodeType=xl
#PBS -l walltime=300:00:00

cd $PBS_O_WORKDIR

module load metamos
module load samtools
if [ ! -d metamos_test_output10 ]; then
initPipeline -q -1 Trimmed_Sponge2_1_paired.fastq -2 Trimmed_Sponge2_2_paired.fastq  -d metamos_test_output10 -i 50:150
fi
runPipeline  -g fraggenescan -t -p 16  -d metamos_test_output10

This is still running but again has stalled at the FindRepeats step for the last 12 hours. and the proba.repetas file is still empty

Any tips as to what I might be doing wrong?

Kind regards

abracarambar avatar Oct 18 '13 20:10 abracarambar