metAMOS
metAMOS copied to clipboard
Error at Scaffold step for my dataset
Dear Dr. Sergey,
As you suggested in 197th issue, I replaced -f with -q. I executed the below shell script in my workstation. Scaffold step was running for a long time, then I encountered following error at Scaffold step.
Shell script name : JSN_sample1.1.sh
/bin/sh
../initPipeline -q -1 S002984_r1.fastq -2 S002984_r2.fastq -d JSNSAMPLE3 -i 300:500 ../runPipeline -a soap -c kraken -g fraggenescan -p 15 -d S002984_r1_sample1.1 -k 55 -f Assemble,MapReads,FindORFS,Annotate,FunctionalAnnotation,Propagate,Classify,Abundance,FindScaffoldORFS -n FunctionalAnnotation
wenchenaafc@wenchenaafc:~/metAMOS-1.5rc3/JSN$ ./JSN_sample1.1.sh Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs Project dir /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1 successfully created! Use runPipeline.py to start Pipeline Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs **no blast DB directory available, disabling steps requiring BLAST DB
Starting Task = runpipeline.RUNPIPELINE Starting metAMOS pipeline Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs Warning: Celera Assembler is not found, some functionality will not be available Warning: BLASR is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: MetaGeneMark is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: metaphylerClassify is not found, some functionality will not be available Warning: PHmmer is not found, some functionality will not be available Warning: PhyloSift was not found, will not be available
Warning: FRCbam is not found, some functionality will not be available Warning: MPI is not available, some functionality may not be available [Available RAM: 65 GB] *ok
Tasks which will be run:
Task = preprocess.Preprocess Task = assemble.SplitAssemblers Task = assemble.Assemble Task = assemble.CheckAsmResults Task = assemble.SplitMappers Task = mapreads.MapReads Task = mapreads.CheckMapResults Task = mapreads.SplitForORFs Task = findorfs.FindORFS Task = validate.Validate Task = findreps.FindRepeats Task = annotate.Annotate Task = fannotate.FunctionalAnnotation Task = scaffold.Scaffold Task = findscforfs.FindScaffoldORFS Task = abundance.Abundance Task = propagate.Propagate Task = classify.Classify Task = postprocess.Postprocess
Warning: Graphviz is not found, some functionality will not be available metAMOS configuration summary: metAMOS Version: v1.5rc2 "Praline Brownie" workflows: core,imetamos Time and Date: 2015-04-22 Working directory: /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1 Prefix: proba K-Mer: 55 Threads: 15 Taxonomic level: class Verbose: False Steps to skip: MultiAlign, FunctionalAnnotation, FindRepeats Steps to force: Abundance, FindORFS, Annotate, Propagate, MapReads, Assemble, FindScaffoldORFS, Classify
[citation] .......
sh: 1: Syntax error: Bad fd number Starting Task = preprocess.PREPROCESS Job = [[S002984_r1.fastq, S002984_r2.fastq] -> preprocess.success] completed Completed Task = preprocess.Preprocess Starting Task = assemble.ASSEMBLE Job = [preprocess.success -> .run] completed Completed Task = assemble.SplitAssemblers Job = [soapdenovo.55.run -> soapdenovo.55.asm.contig] completed Completed Task = assemble.Assemble Job = [[soapdenovo.55.asm.contig] -> [assemble.ok]] completed Completed Task = assemble.CheckAsmResults Uptodate Task = assemble.SplitMappers Starting Task = mapreads.MAPREADS Job = [soapdenovo.55.asm.contig -> soapdenovo.55.contig.cvg] completed Completed Task = mapreads.MapReads Job = [[soapdenovo.55.contig.cvg] -> [mapreads.ok]] completed Completed Task = mapreads.CheckMapResults Uptodate Task = mapreads.SplitForORFs Starting Task = findorfs.FINDORFS Job = [soapdenovo.55.contig.cvg -> soapdenovo.55.faa] completed Completed Task = findorfs.FindORFS Starting Task = validate.VALIDATE Job = [[soapdenovo.55.faa] -> [validate.ok]] completed Completed Task = validate.Validate Starting Task = findrepeats.FINDREPEATS Job = [proba.fna -> proba.repeats] completed Completed Task = findreps.FindRepeats Starting Task = annotate.ANNOTATE Job = [proba.faa -> proba.hits] completed Completed Task = annotate.Annotate Starting Task = functionalannotation.FUNCTIONALANNOTATION Job = [proba.faa -> [blast.out, krona.ec.input]] completed Completed Task = fannotate.FunctionalAnnotation Starting Task = scaffold.SCAFFOLD �[**************************************************************** _ERROR_********** During scaffold, the following command failed with return code -11:
/home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/OrientContigs -minRedundancy 5 -all -redundancy 10 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk -repeats /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.reps _DETAILS_********** Last 10 commands run before the error (/home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Logs/COMMANDS.log) |2015-04-22 08:39:50|# [SCAFFOLD] |2015-04-22 08:39:51| rm -rf /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:44:30| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/toAmos_new -Q /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Preprocess/out/lib1.seq -i --min 1 --max 2180 --libname lib1 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:45:41| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/toAmos_new -c /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Assemble/out/proba.asm.tigr -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:48:03| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/asmQC -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk -scaff -recompute -update -numsd 2 |2015-04-22 08:48:03| perl /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/bank-unlock /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:50:03| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/clk -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:51:14| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/Bundler -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 09:03:31| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/MarkRepeats -redundancy 50 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk > /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.reps |2015-04-22 09:16:26| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/OrientContigs -minRedundancy 5 -all -redundancy 10 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk -repeats /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.reps
Last 10 lines of output (/home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Logs/SCAFFOLD.log) FOR SKIPPED EDGE 628693 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 666344 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 687721 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 690038 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 692427 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 702856 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 724282 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 733650 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 737280 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 745176 SET EDGE STATUS TO BE 6
Please veryify input data and restart MetAMOS. If the problem persists please contact the MetAMOS development team. _ERROR_**********
rm: cannot remove ‘/home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Logs/scaffold.ok’: No such file or directory Oops, MetAMOS finished with errors! see text in red above for details. wenchenaafc@wenchenaafc:~/metAMOS-1.5rc3/JSN$
Dear Dr. Sergey,
I used two different samples. Still, I am getting the same error in my local machine. What is the reason for this error? Any help is much appreciated.
Usually this would mean the scaffolding crashed on your system, however, there is no error message in the output indicating why the program exited. You can work around the issue by skipping the scaffold step using -n Scaffold in your runPipeline command. If you can share your dataset, we can try to reproduce the error locally.
Dear Dr. Sergey,
I did the same thing, I skipped "Scaffold" option in run pipeline and rerun the command. It worked successfully. I will check with my supervisor and for sending you the data.
Only 0.006% is assigned to Bacteria. 83% of bacteria are unassigned. Raw reads - 730 from 18,207,146. Contigs - 35/263,856.
A) To improve the annotation,
- can I use custom database instead of minikraken?
- If I can do it, I need to place them under Utilities/DB/. Am I correct?
B) What is the drawback, if I skip scaffold step for all my samples?
metAMOS support several classifiers: http://metamos.readthedocs.org/en/v1.5rc3/content/programs.html
You would need to install the optional components to get most of them (PhyloSift, FCP, etc). However most are significantly slower than Kraken. You can also check how many of your sequences are mapped to your assembly. If a significant fraction cannot be mapped, you should add the -u option to runPipeline to classify the unmapped reads as well. You can see those files in Postprocess/out/proba.lib1.unaligned.fasta and the aligned reads in Postprocess/out/proba.lib1.contig.reads
Thanks Dr. Sergey. Yeah, I already installed optional components.
- Using FCP.
- Then using -u option in runPipeline to classify the unmapped reads.
Dear Dr. Sergey,
Due to data privacy, I am unable to send the sequences to you.
- I ran the command using FCP instead of minikraken DB. I started the pipeline yesterday 4pm, still the pipeline is at preprocess step. Does FCP DB normally take more hours for processing?
Yes, FCP is significantly slower than Kraken.
Dear Dr. Sergey,
The above command with FCP DB is running in my workstation.
Also, I got permission in our cluster (which has 1TB of RAM) and so I am simultaneously planning to try with full kraken DB. Hence, I downloaded the full kraken DB from following URL (ftp://ftp.cbcb.umd.edu/pub/data/treangen/allDBs.tar.gz).
As you mentioned in issue 194 for mini kraken DB. I am planning to extract the full kraken DB in following location /home/prabhakaranra/metAMOS-1.5rc3/Utilities/DB/kraken/: and replace mini kraken DB with full kraken DB. Am I correct?
I have been told to have fungal reference genomes in my database. That's the reason I am going for full kraken DB or some other database in future.
The full Kraken DB does not include fungal genomes. It includes complete RefSeq genomes for the bacterial, archaeal, and viral domains as well as H. sapiens. You can see this list in the Kraken manual: https://ccb.jhu.edu/software/kraken/MANUAL.html#standard-kraken-database
You will need to build a Kraken database to include fungal genomes yourself. You can follow the Kraken manual: https://ccb.jhu.edu/software/kraken/MANUAL.html#custom-databases
and then place it in /home/prabhakaranra/metAMOS-1.5rc3/Utilities/DB/kraken/
Thanks Dr. Sergey, I will try to build the database with fungal and bacterial sequences.