Augustus
Augustus copied to clipboard
Augustus PPX mode: augustInvalid block no. error
Hi there,
I've made a small prototype Nextflow pipeline to run Augustus in PPX mode with custom protein profiles, via Docker or Singularity: https://github.com/photocyte/luciferase-PPX-predictor-nf
( Augustus Docker image from quay.io/biocontainers/augustus:3.4.0--pl5321hd8b735c_3
, see https://quay.io/repository/biocontainers/augustus?tab=tags )
I think I was able to make a good .prfl
file from a custom MSA in FASTA format, but, Augustus errors out when I try to use it:
genome_fasta=Ilumi1.3-grep13255.fasta
prfl_file=elateroidea_luciferase_clade.msa.fa.prfl
augustus --species=fly --proteinprofile=${prfl_file} ${genome_fasta}
Command output:
# This output was generated with AUGUSTUS (version 3.4.0).
# AUGUSTUS is a gene prediction tool written by M. Stanke ([email protected]),
# O. Keller, S. König, L. Gerischer, L. Romoth and Katharina Hoff.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Sources of extrinsic information: M RM
# Initializing the parameters using config directory /usr/local/config/ ...
# Using protein profile unknown
# --[0..54]--> unknown_A (134) <--[2..4]--> unknown_B (168) <--[5..32]--> unknown_C (71) <--[0..7]--> unknown_D (91) <--[6..12]--
# fly version. Using default transition matrix.
# Looks like Ilumi1.3-grep13255.fasta is in fasta format.
# We have hints for 0 sequences and for 0 of the sequences in the input set.
#
# ----- prediction on sequence number 1 (length = 366736, name = Ilumi1.3_Scaffold13255) -----
#
# Predicted genes for sequence number 1 on both strands
Command error:
augustus: ERROR
augustInvalid block no. in SubstateModel::blockNoOfB
The elateroidea_luciferase_clade.msa.fa.prfl
file was made with this command:
msa_fasta=elateroidea_luciferase_clade.msa.fa
msa2prfl.pl --qij=/usr/local/config/profile/default.qij --prefix_from_seqnames --max_entropy=0.75 \
${msa_fasta} > ${msa_fasta}.prfl
Is there something wrong with the .prfl
file I am creating? Relevant files attatched:
Archive.zip
Hi,
I was able to reproduce your error and your protein profile looks fine to me. The issue seems to be related to the UTR prediction of Augustus in combination with the protein profile mode. I recommend turning off UTR prediction, e.g.:
genome_fasta=Ilumi1.3-grep13255.fasta
prfl_file=elateroidea_luciferase_clade.msa.fa.prfl
augustus --species=fly --proteinprofile=${prfl_file} --UTR=off ${genome_fasta}
If the UTRs are important to you, I can take a closer look at the code causing this bug, but this may take some time.
Best, Lars
Thank you! I can confirm adding --UTR=off
is a workaround. I imagine the UTR training might be limited to highly curated models like --species=fly
, but shutting it off explicitly seems like good practice if there are some unexpected interactions.