Roary
Roary copied to clipboard
Persistent "BLAST Database error" during Parallel all vs. all blast
Dear Andrew and Torsten,
I am using Roary version 3.13.0, which I installed with conda. I've been computing the alignments with a subset of 24 E.coli samples. However I am worried about the constant error:
Parallel all against all blast BLAST Database error: No alias or index file found for protein database [/scratch/devel/talioto/denovo_assemblies/CRE/analysis_FC/rel7_Roary_RAxML/Ecoli/out_mini5/7cMRe98Zya/output_contigs] in search path [/scratch/devel/talioto/denovo_assemblies/CRE/analysis_FC/rel7_Roary_RAxML/Ecoli/out_mini5:/scratch/project/devel/aateam/blastdbs:]
A colleague computed these alignments with 26 samples before without getting this error and alignments were longer comprising 3.01Mb (on average) of the core genome while now my output core_gene_alignment.aln file has 2.25Mb sequences. The only difference in the input dataset is discarding 2 samples , and that doesn't explain the different length of the alignments.
We both have used prokka 1.12 gff files as input to run roary as: roary -p 16 -f out_roary -e -n -v gff/*.gff
As you will see, I even added the standard ncbi database including nt, nr...to BLASTDB. However, my understandig is that the blastp should be indexing the cd-hit output
I've been checking https://metacpan.org/pod/Bio::Roary::ParallelAllAgainstAllBlast but does not give me any clue of the error.
What is exactly happening here? Do you know why parallel blast is spitting out this error?
Thanks, Fernando
same error here. @gitcruz , were you able to solve this?
Hi @fconstancias I was on holidays and haven't fixed this yet...
Dear @fconstancias
We finally solve this. A colleague pointed me to the fact that the blastp "all vs. all" requires "file locking capability". And actually, the computing nodes on our cluster do not allow doing that on our main disk partition. The solution was doing this in $TMPDIR and copying (after roary finishes) the entire output folder to the working directory.
I did something like this:
mkdir -p Ecoli/gff ;
cd Ecoli;
CWD=$PWD;
#1. copy all gffs for that sample (Ecoli) inside the gff folder for sample in $(cat Ecoli.rel7.samples.txt); do echo $sample; cp /path-to-prokka/$sample/prokka_annotation.gff gff/$sample.gff ; done
#2. go to TMPDIR to avoid file locking (that produces BLAST Errors and shorter alignments) and run ROARY
cd $TMPDIR
mkdir -p out_roary ;
time roary -p 16 -f out_roary/ -a -e -n -v $CWD/gff/*.gff &> Ecoli.roary.log
wait
#3. copy back to the working directory from TMPDIR cp -r out_roary $CWD; cp Ecoli.roary.log $CWD;
Best, F.