metAMOS icon indicating copy to clipboard operation
metAMOS copied to clipboard

Error at Scaffold step for my dataset

Open Ramanandan opened this issue 9 years ago • 10 comments

Dear Dr. Sergey,

As you suggested in 197th issue, I replaced -f with -q. I executed the below shell script in my workstation. Scaffold step was running for a long time, then I encountered following error at Scaffold step.

Shell script name : JSN_sample1.1.sh

/bin/sh

../initPipeline -q -1 S002984_r1.fastq -2 S002984_r2.fastq -d JSNSAMPLE3 -i 300:500 ../runPipeline -a soap -c kraken -g fraggenescan -p 15 -d S002984_r1_sample1.1 -k 55 -f Assemble,MapReads,FindORFS,Annotate,FunctionalAnnotation,Propagate,Classify,Abundance,FindScaffoldORFS -n FunctionalAnnotation

wenchenaafc@wenchenaafc:~/metAMOS-1.5rc3/JSN$ ./JSN_sample1.1.sh Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs Project dir /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1 successfully created! Use runPipeline.py to start Pipeline Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs **no blast DB directory available, disabling steps requiring BLAST DB

Starting Task = runpipeline.RUNPIPELINE Starting metAMOS pipeline Error: cannot find BLAST DB directory, expected it in /home/wenchenaafc/metAMOS-1.5rc3/Utilities/DB/. Disabling blastdb dependent programs Warning: Celera Assembler is not found, some functionality will not be available Warning: BLASR is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: MetaGeneMark is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: metaphylerClassify is not found, some functionality will not be available Warning: PHmmer is not found, some functionality will not be available Warning: PhyloSift was not found, will not be available

Warning: FRCbam is not found, some functionality will not be available Warning: MPI is not available, some functionality may not be available [Available RAM: 65 GB] *ok

Tasks which will be run:

Task = preprocess.Preprocess Task = assemble.SplitAssemblers Task = assemble.Assemble Task = assemble.CheckAsmResults Task = assemble.SplitMappers Task = mapreads.MapReads Task = mapreads.CheckMapResults Task = mapreads.SplitForORFs Task = findorfs.FindORFS Task = validate.Validate Task = findreps.FindRepeats Task = annotate.Annotate Task = fannotate.FunctionalAnnotation Task = scaffold.Scaffold Task = findscforfs.FindScaffoldORFS Task = abundance.Abundance Task = propagate.Propagate Task = classify.Classify Task = postprocess.Postprocess

Warning: Graphviz is not found, some functionality will not be available metAMOS configuration summary: metAMOS Version: v1.5rc2 "Praline Brownie" workflows: core,imetamos Time and Date: 2015-04-22 Working directory: /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1 Prefix: proba K-Mer: 55 Threads: 15 Taxonomic level: class Verbose: False Steps to skip: MultiAlign, FunctionalAnnotation, FindRepeats Steps to force: Abundance, FindORFS, Annotate, Propagate, MapReads, Assemble, FindScaffoldORFS, Classify

[citation] .......

sh: 1: Syntax error: Bad fd number Starting Task = preprocess.PREPROCESS Job = [[S002984_r1.fastq, S002984_r2.fastq] -> preprocess.success] completed Completed Task = preprocess.Preprocess Starting Task = assemble.ASSEMBLE Job = [preprocess.success -> .run] completed Completed Task = assemble.SplitAssemblers Job = [soapdenovo.55.run -> soapdenovo.55.asm.contig] completed Completed Task = assemble.Assemble Job = [[soapdenovo.55.asm.contig] -> [assemble.ok]] completed Completed Task = assemble.CheckAsmResults Uptodate Task = assemble.SplitMappers Starting Task = mapreads.MAPREADS Job = [soapdenovo.55.asm.contig -> soapdenovo.55.contig.cvg] completed Completed Task = mapreads.MapReads Job = [[soapdenovo.55.contig.cvg] -> [mapreads.ok]] completed Completed Task = mapreads.CheckMapResults Uptodate Task = mapreads.SplitForORFs Starting Task = findorfs.FINDORFS Job = [soapdenovo.55.contig.cvg -> soapdenovo.55.faa] completed Completed Task = findorfs.FindORFS Starting Task = validate.VALIDATE Job = [[soapdenovo.55.faa] -> [validate.ok]] completed Completed Task = validate.Validate Starting Task = findrepeats.FINDREPEATS Job = [proba.fna -> proba.repeats] completed Completed Task = findreps.FindRepeats Starting Task = annotate.ANNOTATE Job = [proba.faa -> proba.hits] completed Completed Task = annotate.Annotate Starting Task = functionalannotation.FUNCTIONALANNOTATION Job = [proba.faa -> [blast.out, krona.ec.input]] completed Completed Task = fannotate.FunctionalAnnotation Starting Task = scaffold.SCAFFOLD �[**************************************************************** _ERROR_********** During scaffold, the following command failed with return code -11:

/home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/OrientContigs -minRedundancy 5 -all -redundancy 10 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk -repeats /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.reps _DETAILS_********** Last 10 commands run before the error (/home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Logs/COMMANDS.log) |2015-04-22 08:39:50|# [SCAFFOLD] |2015-04-22 08:39:51| rm -rf /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:44:30| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/toAmos_new -Q /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Preprocess/out/lib1.seq -i --min 1 --max 2180 --libname lib1 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:45:41| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/toAmos_new -c /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Assemble/out/proba.asm.tigr -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:48:03| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/asmQC -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk -scaff -recompute -update -numsd 2 |2015-04-22 08:48:03| perl /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/bank-unlock /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:50:03| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/clk -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 08:51:14| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/Bundler -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk |2015-04-22 09:03:31| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/MarkRepeats -redundancy 50 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk > /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.reps |2015-04-22 09:16:26| /home/wenchenaafc/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin/OrientContigs -minRedundancy 5 -all -redundancy 10 -b /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.bnk -repeats /home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Scaffold/in/proba.reps

Last 10 lines of output (/home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Logs/SCAFFOLD.log) FOR SKIPPED EDGE 628693 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 666344 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 687721 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 690038 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 692427 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 702856 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 724282 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 733650 SET EDGE STATUS TO BE 6 FOR SKIPPED EDGE 737280 SET EDGE STATUS TO BE 5 FOR SKIPPED EDGE 745176 SET EDGE STATUS TO BE 6

Please veryify input data and restart MetAMOS. If the problem persists please contact the MetAMOS development team. _ERROR_**********

rm: cannot remove ‘/home/wenchenaafc/metAMOS-1.5rc3/JSN/S002984_r1_sample1.1/Logs/scaffold.ok’: No such file or directory Oops, MetAMOS finished with errors! see text in red above for details. wenchenaafc@wenchenaafc:~/metAMOS-1.5rc3/JSN$

Ramanandan avatar Apr 23 '15 19:04 Ramanandan

Dear Dr. Sergey,

I used two different samples. Still, I am getting the same error in my local machine. What is the reason for this error? Any help is much appreciated.

Ramanandan avatar Apr 24 '15 15:04 Ramanandan

Usually this would mean the scaffolding crashed on your system, however, there is no error message in the output indicating why the program exited. You can work around the issue by skipping the scaffold step using -n Scaffold in your runPipeline command. If you can share your dataset, we can try to reproduce the error locally.

skoren avatar Apr 24 '15 18:04 skoren

Dear Dr. Sergey,

I did the same thing, I skipped "Scaffold" option in run pipeline and rerun the command. It worked successfully. I will check with my supervisor and for sending you the data.

Only 0.006% is assigned to Bacteria. 83% of bacteria are unassigned. Raw reads - 730 from 18,207,146. Contigs - 35/263,856.

A) To improve the annotation,

  1. can I use custom database instead of minikraken?
  2. If I can do it, I need to place them under Utilities/DB/. Am I correct?

B) What is the drawback, if I skip scaffold step for all my samples?

output of annotate tab

Ramanandan avatar Apr 24 '15 19:04 Ramanandan

metAMOS support several classifiers: http://metamos.readthedocs.org/en/v1.5rc3/content/programs.html

You would need to install the optional components to get most of them (PhyloSift, FCP, etc). However most are significantly slower than Kraken. You can also check how many of your sequences are mapped to your assembly. If a significant fraction cannot be mapped, you should add the -u option to runPipeline to classify the unmapped reads as well. You can see those files in Postprocess/out/proba.lib1.unaligned.fasta and the aligned reads in Postprocess/out/proba.lib1.contig.reads

skoren avatar Apr 24 '15 19:04 skoren

Thanks Dr. Sergey. Yeah, I already installed optional components.

  1. Using FCP.
  2. Then using -u option in runPipeline to classify the unmapped reads.

Ramanandan avatar Apr 24 '15 20:04 Ramanandan

Dear Dr. Sergey,

Due to data privacy, I am unable to send the sequences to you.

  1. I ran the command using FCP instead of minikraken DB. I started the pipeline yesterday 4pm, still the pipeline is at preprocess step. Does FCP DB normally take more hours for processing?

Ramanandan avatar Apr 28 '15 15:04 Ramanandan

Yes, FCP is significantly slower than Kraken.

skoren avatar Apr 28 '15 16:04 skoren

Dear Dr. Sergey,

The above command with FCP DB is running in my workstation.

Also, I got permission in our cluster (which has 1TB of RAM) and so I am simultaneously planning to try with full kraken DB. Hence, I downloaded the full kraken DB from following URL (ftp://ftp.cbcb.umd.edu/pub/data/treangen/allDBs.tar.gz).

As you mentioned in issue 194 for mini kraken DB. I am planning to extract the full kraken DB in following location /home/prabhakaranra/metAMOS-1.5rc3/Utilities/DB/kraken/: and replace mini kraken DB with full kraken DB. Am I correct?

I have been told to have fungal reference genomes in my database. That's the reason I am going for full kraken DB or some other database in future.

Ramanandan avatar Apr 28 '15 16:04 Ramanandan

The full Kraken DB does not include fungal genomes. It includes complete RefSeq genomes for the bacterial, archaeal, and viral domains as well as H. sapiens. You can see this list in the Kraken manual: https://ccb.jhu.edu/software/kraken/MANUAL.html#standard-kraken-database

You will need to build a Kraken database to include fungal genomes yourself. You can follow the Kraken manual: https://ccb.jhu.edu/software/kraken/MANUAL.html#custom-databases

and then place it in /home/prabhakaranra/metAMOS-1.5rc3/Utilities/DB/kraken/

skoren avatar Apr 28 '15 16:04 skoren

Thanks Dr. Sergey, I will try to build the database with fungal and bacterial sequences.

Ramanandan avatar Apr 28 '15 23:04 Ramanandan