tormes
tormes copied to clipboard
Annotation Paused
Hello,
I recently ran Tormes and while the pipeline completed and I was able to generate an HTML report file and assembly.tgz files for each sample (3 samples total). However I am missing what seems to be the bulk of the prokka outputs. While a directory was created for each sample the only file within was the FNA file along with the log for each sample. All three ended at the identical step where there seemed to be an issue with the contig ID name (see representative log file below).
I was wondering if there is a way to solve this issue within Tormes and rerun a shorter version of the pipeline perhaps with the assembled genomes that will provide annotation as well.
Any help would be greatly appreciated and thank you for the time.
Best, Zack
Hi Zack,
The issue is caused because prokka doesn't like the long naming of some of the contigs. This is rare and I have never experienced it personally.
You can rename the contigs so that prokka will like them and then rerun just the fasta files through tormes again. I have written a script for you that will chop off the last '.' and the following numbers in your contig naming. (those details are only there for 'bandage' which is software that you can use to visualize spades assemblies.
Copy this script into a file on your linux computer:
type in:
nano fastacontigchange.sh
then paste this:
#!/bin/bash
# usage: script.sh sequences.fasta > newsequence.fasta
while read line ; do
if [ ${line:0:1} == ">" ] ; then
IFS='\.' read -a header <<< "$line"
echo -e "${header[0]}"
else
echo -e "$line"
fi
done < $1
then type ctrl-x and y for yes and hit enter.
Then make the script executable:
chmod +x fastacontigchange.sh
Then run your fasta files through the script sending the output to a new filename like so:
./fastacongtigchange.sh Sample-1.fasta > Sample-1a.fasta
./fastacongtigchange.sh Sample-2.fasta > Sample-2a.fasta
./fastacongtigchange.sh Sample-3.fasta > Sample-3a.fasta
Do that for your three samples so you have three new files.
Here is the difference between two examples in the contig naming:
before the script:
$ head Sample-1.fasta
NODE_1_length_81586_cov_70.951870 GTGGGGTGCGGCCACCATGGCCGACAGGGGATTTCTGCCGGCGCGGTTCGGTAGCGGCGC CAGAATCGTGCACTTTCCGCCCCATCCTTTGGGGCGCCCCATCCACTGGGCGCGCGTCAA
after the script:
$ head Sample-1a.fasta
NODE_1_length_81586_cov_70 GTGGGGTGCGGCCACCATGGCCGACAGGGGATTTCTGCCGGCGCGGTTCGGTAGCGGCGC CAGAATCGTGCACTTTCCGCCCCATCCTTTGGGGCGCCCCATCCACTGGGCGCGCGTCAA
Then run those three new files through tormes and it should annotate properly.
Thank you so much for the reply I will try this and report back.
One question for the new tormes run. Obviously the contig fasta files are not paired forward/reverse reads so are they then submitted as genomes with a corresponding metadata file?
Thanks again for the help.
Hi Zack,
in the metadata file, in the place of read 1, put the word GENOME and put the fasta file name in read 2. probably easier to just give you an example. :)
Samples Read1 Read2 Description
Sample-1a GENOME Sample-1a.fasta Reads from Sample-1a
Sample-2a GENOME Sample-2a.fasta Reads from Sample-2a
Samples-3a GENOME Samples-3a.fasta Reads from Samples-3a
Hi Zack,
Brad is right! The issue comes with Prokka and long contigs' names (such as the automatic one generated with SPAdes). Please check if Brad's solution fulfills your needs.
You can find a shortcut to generate metadata files here: https://github.com/nmquijada/tormes/wiki/Shortcut-to-generate-the-metadata-file-for-TORMES
Let us know! Narciso