funannotate
funannotate copied to clipboard
SignalP results not ending up in final annotation
Are you using the latest release? 1.8.17
Describe the bug SignalP 6 annotation runs and detects secreted proteins, but the respective final annotation does not contain anything about it, and neither does the stats.json file.
[May 24 12:47 PM]: Predicting secreted proteins with SignalP
[May 24 01:53 PM]: 1,154 secretome and 0 transmembane annotations added
stats.json:
"functional": {
"go_terms": 7233,
"interproscan": 9057,
"eggnog": 10620,
"pfam": 7905,
"cazyme": 541,
"merops": 379,
"busco": 1300,
"secretion": 0
},
Signal P results file:
AlAUS0001_000005-T1 AlAUS0001_000005 SP 0.045362 0.954630 CS pos: 25-26. Pr: 0.8822
AlAUS0001_000006-T1 AlAUS0001_000006 SP 0.000272 0.999683 CS pos: 16-17. Pr: 0.9835
Final annotation file for those two:
AlAUS0001_ctg01 funannotate gene 36364 39102 . - . ID=AlAUS0001_000005;
AlAUS0001_ctg01 funannotate mRNA 36364 39102 . - . ID=AlAUS0001_000005-T1;Parent=AlAUS0001_000005;product=hypothetical protein;Ontology_term=GO:0004553,GO:0005975;Dbxref=InterPro:IPR006103,InterPro:IPR008964,InterPro:IPR023232,InterPro:IPR017853,PFAM:PF02837,InterPro:IPR036156,InterPro:IPR048229,PFAM:PF00703,InterPro:IPR013783,InterPro:IPR006104,PFAM:PF18565,InterPro:IPR006101,PFAM:PF16355,InterPro:IPR032311,InterPro:IPR051913,InterPro:IPR040605,InterPro:IPR006102,PFAM:PF02836,InterPro:IPR008979;EC_number=3.2.1.23;note=COG:G,EggNog:ENOG503P0GE,CAZy:GH2;
AlAUS0001_ctg01 funannotate exon 37267 39102 . - . ID=AlAUS0001_000005-T1.exon1;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01 funannotate exon 37071 37215 . - . ID=AlAUS0001_000005-T1.exon2;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01 funannotate exon 36364 37016 . - . ID=AlAUS0001_000005-T1.exon3;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01 funannotate CDS 37267 39102 . - 0 ID=AlAUS0001_000005-T1.cds;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01 funannotate CDS 37071 37215 . - 0 ID=AlAUS0001_000005-T1.cds;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01 funannotate CDS 36364 37016 . - 2 ID=AlAUS0001_000005-T1.cds;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01 funannotate gene 42523 44365 . + . ID=AlAUS0001_000006;
AlAUS0001_ctg01 funannotate mRNA 42523 44365 . + . ID=AlAUS0001_000006-T1;Parent=AlAUS0001_000006;product=hypothetical protein;Dbxref=PFAM:PF00144,InterPro:IPR001466,InterPro:IPR012338,InterPro:IPR051478;note=EggNog:ENOG503Q3SS,COG:V,MEROPS:MER0026262;
AlAUS0001_ctg01 funannotate exon 42523 43012 . + . ID=AlAUS0001_000006-T1.exon1;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01 funannotate exon 43081 43221 . + . ID=AlAUS0001_000006-T1.exon2;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01 funannotate exon 43281 44365 . + . ID=AlAUS0001_000006-T1.exon3;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01 funannotate CDS 42523 43012 . + 0 ID=AlAUS0001_000006-T1.cds;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01 funannotate CDS 43081 43221 . + 2 ID=AlAUS0001_000006-T1.cds;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01 funannotate CDS 43281 44365 . + 2 ID=AlAUS0001_000006-T1.cds;Parent=AlAUS0001_000006-T1;
What command did you issue?
for file in *.fasta
do
ID="${file%%.fasta}" && \
echo "++++++++++++++++ starting with sample $ID +++++++++++++++++++++++++" && \
funannotate predict --cpus $(nproc) -i $file -o funannotate_$ID --species "Ascochyta lentis" --augustus_species Alentis --name $ID --isolate $ID --protein_evidence /data/databases/nonredundant_lentis_proteins.fasta /data/databases/uniprot_sprot.fasta --force && \
funannotate iprscan --cpus $(nproc) -i funannotate_$ID -m local && \
funannotate annotate --cpus $(nproc) -i funannotate_$ID
done
Logfiles
OS/Install Information
Ubuntu 24.04.1 LTS
-------------------------------------------------------
Checking dependencies for 1.8.17
-------------------------------------------------------
You are running Python v 3.9.19. Now checking python packages...
biopython: 1.79
goatools: 1.4.12
matplotlib: 3.9.4
natsort: 8.4.0
numpy: 1.26.4
pandas: 2.2.3
psutil: 7.0.0
requests: 2.32.3
scikit-learn: 1.6.1
scipy: 1.13.1
seaborn: 0.13.2
All 11 python packages installed
You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.50
Clone: 0.46
DBD::SQLite: 1.76
DBD::mysql: 4.050
DBI: 1.643
DB_File: 1.858
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.24
Getopt::Long: 2.58
Hash::Merge: 0.302
JSON: 4.10
LWP::UserAgent: 6.68
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.03
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.17
YAML: 1.30
local::lib: 2.000029
threads: 2.25
threads::shared: 1.61
All 27 Perl modules installed
Checking Environmental Variables...
$FUNANNOTATE_DB=/data/databases/
$PASAHOME=/data/mamba_envs/envs/funannotate/opt/pasa-2.5.3
$TRINITY_HOME=/data/mamba_envs/envs/funannotate/opt/trinity-2.15.2
$EVM_HOME=/data/mamba_envs/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/data/mamba_envs/envs/funannotate/config/
$GENEMARK_PATH=/opt/genemark/current/
All 6 environmental variables are set
Checking external dependencies...
CodingQuarry: 2.0
Trinity: 2.15.2
augustus: 3.5.0
bamtools: bamtools 2.5.2
bedtools: bedtools v2.31.1
blat: BLAT v39x1
diamond: 2.1.10
emapper.py: 2.1.12
ete3: 3.1.3
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2024-11-20
gmes_petap.pl: 4.71_lic
hisat2: 2.2.1
hmmscan: HMMER 3.4 (Aug 2023)
hmmsearch: HMMER 3.4 (Aug 2023)
java: 18.0.2.1
kallisto: 0.46.1
mafft: v7.526 (2024/Apr/26)
makeblastdb: makeblastdb 2.16.0+
minimap2: 2.28-r1209
pigz: 2.8
proteinortho: 6.3.4
pslCDnaFilter: no way to determine
salmon: salmon 1.10.3
samtools: samtools 1.21
signalp: 6.0
snap: 2006-07-28
stringtie: 2.2.3
tRNAscan-SE: 2.0.12 (Nov 2022)
tantan: tantan 51
tbl2asn: 25.8
tblastn: tblastn 2.16.0+
trimal: trimAl v1.5.rev0 build[2024-05-27]
trimmomatic: 0.39
All 37 external dependencies are installed
Is there anything else in the logfiles in the logfiles folder within the funannotate run?
Seems like if it is making it into the other parts so this is a bit mysterious.
It may be a parsing problem during creation of the all.annotations.txt file.
All the secreted entries have an additional bit of text separated by a space ie AlAUS0001_000005-T1 SPACE AlAUS0001_000005 TAB note TAB SECRETED:SignalP(1-25)
Below a short extract where the secreted block starts
AlAUS0001_010877-T1 go_process amino acid transmembrane transport|0003333||IEA
AlAUS0001_010899-T1 db_xref InterPro:IPR037651
AlAUS0001_010899-T1 go_function ATP-dependent H2AZ histone chaperone activity|0140849||IEA
AlAUS0001_010899-T1 go_component Swr1 complex|0000812||IEA
AlAUS0001_010899-T1 go_process chromatin remodeling|0006338||IEA
AlAUS0001_000005-T1 AlAUS0001_000005 note SECRETED:SignalP(1-25)
AlAUS0001_000006-T1 AlAUS0001_000006 note SECRETED:SignalP(1-16)
AlAUS0001_000019-T1 AlAUS0001_000019 note SECRETED:SignalP(1-28)
AlAUS0001_000024-T1 AlAUS0001_000024 note SECRETED:SignalP(1-25)
AlAUS0001_000034-T1 AlAUS0001_000034 note SECRETED:SignalP(1-22)
These are all the logfiles that were generetated, also attached below
funannotate-annotate.log funannotate-predict.log funannotate-p2g.log funannotate-EVM.log busco.log augustus-parallel.log
that seems like something is unexpected in how signalp is then being run - mine looks like this so only the transcript name is there
ACMYSQ_000040-T1 note SECRETED:SignalP(1-18)
ACMYSQ_000048-T1 note SECRETED:SignalP(1-20)
ACMYSQ_000057-T1 note SECRETED:SignalP(1-21)
ACMYSQ_000092-T1 note SECRETED:SignalP(1-16)
and my signalp results look like this
more signalp.results.txt
# SignalP-6.0 Organism: Eukarya Timestamp: 20250422220620
# ID Prediction OTHER SP(Sec/SPI) CS Position
ACMYSQ_000001-T1 ACMYSQ_000001 OTHER 1.000000 0.000003
ACMYSQ_000002-T1 ACMYSQ_000002 OTHER 1.000000 0.000000
ACMYSQ_000003-T1 ACMYSQ_000003 OTHER 1.000000 0.000005
ACMYSQ_000004-T1 ACMYSQ_000004 OTHER 1.000000 0.000000
ACMYSQ_000005-T1 ACMYSQ_000005 OTHER 1.000000 0.000001
ACMYSQ_000006-T1 ACMYSQ_000006 OTHER 1.000000 0.000000
ACMYSQ_000007-T1 ACMYSQ_000007 OTHER 0.999996 0.000011
ACMYSQ_000008-T1 ACMYSQ_000008 OTHER 0.993794 0.006229
the code is actually confusing to me as I don't see where the gene name is stripped out with split(' ') I would expect @nextgenusfs
https://github.com/nextgenusfs/funannotate/blob/033a883081a83a161798ecc17eaf77b16b5c552b/funannotate/library.py#L7310 https://github.com/nextgenusfs/funannotate/blob/033a883081a83a161798ecc17eaf77b16b5c552b/funannotate/library.py#L7313
can you post snippet of annotations.secretome.txt and signalp.results.txt and also annotations.transmembrane.txt I'm trying to also see why there are no TMs. I can't tell if there are any silly line ending problems here that would cause it either.
secretome.txt
AlAUS0001_000005-T1 AlAUS0001_000005 note SECRETED:SignalP(1-25)
AlAUS0001_000006-T1 AlAUS0001_000006 note SECRETED:SignalP(1-16)
AlAUS0001_000019-T1 AlAUS0001_000019 note SECRETED:SignalP(1-28)
AlAUS0001_000024-T1 AlAUS0001_000024 note SECRETED:SignalP(1-25)
AlAUS0001_000034-T1 AlAUS0001_000034 note SECRETED:SignalP(1-22)
AlAUS0001_000037-T1 AlAUS0001_000037 note SECRETED:SignalP(1-26)
signalp.results.txt
# ID Prediction OTHER SP(Sec/SPI) CS Position
AlAUS0001_000001-T1 AlAUS0001_000001 OTHER 1.000000 0.000000
AlAUS0001_000002-T1 AlAUS0001_000002 OTHER 1.000000 0.000001
AlAUS0001_000003-T1 AlAUS0001_000003 OTHER 1.000000 0.000000
AlAUS0001_000004-T1 AlAUS0001_000004 OTHER 1.000000 0.000000
AlAUS0001_000005-T1 AlAUS0001_000005 SP 0.045362 0.954630 CS pos: 25-26. Pr: 0.8822
I don't have a transmembrane annotations file.
My secretome file does indeed look different thant yours.
there's got to be something messing up the splitting the ID and I don't quite know where this is coming from, can you of course make sure you have the latest version of the code from github installed? I can't tell for sure if anything like this was fixed for signalp6 parsing since the 1.8.17 release...
conda activate funannotate # or however else you have the funannotate env loaded
python -m pip install git+https://github.com/nextgenusfs/funannotate.git
the transmembrane domain pred will only happen if phobius is installed. You can check what is or isn't installed with
funannotate check --show-versions
OK, installed latest version as per your command above, reran interproscan and annotations step, still same result.
I'll try and look at some code - seems like it needs a different parsing bit in there - you can probably fix this by just doing a simple perl / python one liner to fix the annotations.txt file or the secretome.txt file and re-running the annotate step if you want to try to force this through. I don't have the bandwidth right now to do any code fixes so I'm not sure what else to tell you is the source of your issue.