funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

SignalP results not ending up in final annotation

Open JWDebler opened this issue 6 months ago • 9 comments

Are you using the latest release? 1.8.17

Describe the bug SignalP 6 annotation runs and detects secreted proteins, but the respective final annotation does not contain anything about it, and neither does the stats.json file.

[May 24 12:47 PM]: Predicting secreted proteins with SignalP 
[May 24 01:53 PM]: 1,154 secretome and 0 transmembane annotations added

stats.json:

"functional": {
                "go_terms": 7233,
                "interproscan": 9057,
                "eggnog": 10620,
                "pfam": 7905,
                "cazyme": 541,
                "merops": 379,
                "busco": 1300,
                "secretion": 0
            },

Signal P results file:

AlAUS0001_000005-T1 AlAUS0001_000005	SP	0.045362	0.954630	CS pos: 25-26. Pr: 0.8822
AlAUS0001_000006-T1 AlAUS0001_000006	SP	0.000272	0.999683	CS pos: 16-17. Pr: 0.9835

Final annotation file for those two:

AlAUS0001_ctg01	funannotate	gene	36364	39102	.	-	.	ID=AlAUS0001_000005;
AlAUS0001_ctg01	funannotate	mRNA	36364	39102	.	-	.	ID=AlAUS0001_000005-T1;Parent=AlAUS0001_000005;product=hypothetical protein;Ontology_term=GO:0004553,GO:0005975;Dbxref=InterPro:IPR006103,InterPro:IPR008964,InterPro:IPR023232,InterPro:IPR017853,PFAM:PF02837,InterPro:IPR036156,InterPro:IPR048229,PFAM:PF00703,InterPro:IPR013783,InterPro:IPR006104,PFAM:PF18565,InterPro:IPR006101,PFAM:PF16355,InterPro:IPR032311,InterPro:IPR051913,InterPro:IPR040605,InterPro:IPR006102,PFAM:PF02836,InterPro:IPR008979;EC_number=3.2.1.23;note=COG:G,EggNog:ENOG503P0GE,CAZy:GH2;
AlAUS0001_ctg01	funannotate	exon	37267	39102	.	-	.	ID=AlAUS0001_000005-T1.exon1;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01	funannotate	exon	37071	37215	.	-	.	ID=AlAUS0001_000005-T1.exon2;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01	funannotate	exon	36364	37016	.	-	.	ID=AlAUS0001_000005-T1.exon3;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01	funannotate	CDS	37267	39102	.	-	0	ID=AlAUS0001_000005-T1.cds;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01	funannotate	CDS	37071	37215	.	-	0	ID=AlAUS0001_000005-T1.cds;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01	funannotate	CDS	36364	37016	.	-	2	ID=AlAUS0001_000005-T1.cds;Parent=AlAUS0001_000005-T1;
AlAUS0001_ctg01	funannotate	gene	42523	44365	.	+	.	ID=AlAUS0001_000006;
AlAUS0001_ctg01	funannotate	mRNA	42523	44365	.	+	.	ID=AlAUS0001_000006-T1;Parent=AlAUS0001_000006;product=hypothetical protein;Dbxref=PFAM:PF00144,InterPro:IPR001466,InterPro:IPR012338,InterPro:IPR051478;note=EggNog:ENOG503Q3SS,COG:V,MEROPS:MER0026262;
AlAUS0001_ctg01	funannotate	exon	42523	43012	.	+	.	ID=AlAUS0001_000006-T1.exon1;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01	funannotate	exon	43081	43221	.	+	.	ID=AlAUS0001_000006-T1.exon2;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01	funannotate	exon	43281	44365	.	+	.	ID=AlAUS0001_000006-T1.exon3;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01	funannotate	CDS	42523	43012	.	+	0	ID=AlAUS0001_000006-T1.cds;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01	funannotate	CDS	43081	43221	.	+	2	ID=AlAUS0001_000006-T1.cds;Parent=AlAUS0001_000006-T1;
AlAUS0001_ctg01	funannotate	CDS	43281	44365	.	+	2	ID=AlAUS0001_000006-T1.cds;Parent=AlAUS0001_000006-T1;

What command did you issue?

for file in *.fasta 
do
ID="${file%%.fasta}" && \
echo "++++++++++++++++ starting with sample $ID +++++++++++++++++++++++++" && \
funannotate predict --cpus $(nproc) -i $file -o funannotate_$ID --species "Ascochyta lentis" --augustus_species Alentis --name $ID --isolate $ID --protein_evidence /data/databases/nonredundant_lentis_proteins.fasta /data/databases/uniprot_sprot.fasta --force && \
funannotate iprscan --cpus $(nproc) -i funannotate_$ID -m local && \
funannotate annotate --cpus $(nproc) -i funannotate_$ID 
done

Logfiles

funannotate-annotate.log

OS/Install Information

 Ubuntu 24.04.1 LTS
 -------------------------------------------------------
Checking dependencies for 1.8.17
-------------------------------------------------------
You are running Python v 3.9.19. Now checking python packages...
biopython: 1.79
goatools: 1.4.12
matplotlib: 3.9.4
natsort: 8.4.0
numpy: 1.26.4
pandas: 2.2.3
psutil: 7.0.0
requests: 2.32.3
scikit-learn: 1.6.1
scipy: 1.13.1
seaborn: 0.13.2
All 11 python packages installed


You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.50
Clone: 0.46
DBD::SQLite: 1.76
DBD::mysql: 4.050
DBI: 1.643
DB_File: 1.858
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.24
Getopt::Long: 2.58
Hash::Merge: 0.302
JSON: 4.10
LWP::UserAgent: 6.68
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.03
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.17
YAML: 1.30
local::lib: 2.000029
threads: 2.25
threads::shared: 1.61
All 27 Perl modules installed


Checking Environmental Variables...
$FUNANNOTATE_DB=/data/databases/
$PASAHOME=/data/mamba_envs/envs/funannotate/opt/pasa-2.5.3
$TRINITY_HOME=/data/mamba_envs/envs/funannotate/opt/trinity-2.15.2
$EVM_HOME=/data/mamba_envs/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/data/mamba_envs/envs/funannotate/config/
$GENEMARK_PATH=/opt/genemark/current/
All 6 environmental variables are set
Checking external dependencies...
CodingQuarry: 2.0
Trinity: 2.15.2
augustus: 3.5.0
bamtools: bamtools 2.5.2
bedtools: bedtools v2.31.1
blat: BLAT v39x1
diamond: 2.1.10
emapper.py: 2.1.12
ete3: 3.1.3
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2024-11-20
gmes_petap.pl: 4.71_lic
hisat2: 2.2.1
hmmscan: HMMER 3.4 (Aug 2023)
hmmsearch: HMMER 3.4 (Aug 2023)
java: 18.0.2.1
kallisto: 0.46.1
mafft: v7.526 (2024/Apr/26)
makeblastdb: makeblastdb 2.16.0+
minimap2: 2.28-r1209
pigz: 2.8
proteinortho: 6.3.4
pslCDnaFilter: no way to determine
salmon: salmon 1.10.3
samtools: samtools 1.21
signalp: 6.0
snap: 2006-07-28
stringtie: 2.2.3
tRNAscan-SE: 2.0.12 (Nov 2022)
tantan: tantan 51
tbl2asn: 25.8
tblastn: tblastn 2.16.0+
trimal: trimAl v1.5.rev0 build[2024-05-27]
trimmomatic: 0.39
All 37 external dependencies are installed

JWDebler avatar May 24 '25 18:05 JWDebler

Is there anything else in the logfiles in the logfiles folder within the funannotate run?

Seems like if it is making it into the other parts so this is a bit mysterious.

hyphaltip avatar May 26 '25 05:05 hyphaltip

It may be a parsing problem during creation of the all.annotations.txt file. All the secreted entries have an additional bit of text separated by a space ie AlAUS0001_000005-T1 SPACE AlAUS0001_000005 TAB note TAB SECRETED:SignalP(1-25)

Below a short extract where the secreted block starts

AlAUS0001_010877-T1	go_process	amino acid transmembrane transport|0003333||IEA
AlAUS0001_010899-T1	db_xref	InterPro:IPR037651
AlAUS0001_010899-T1	go_function	ATP-dependent H2AZ histone chaperone activity|0140849||IEA
AlAUS0001_010899-T1	go_component	Swr1 complex|0000812||IEA
AlAUS0001_010899-T1	go_process	chromatin remodeling|0006338||IEA
AlAUS0001_000005-T1 AlAUS0001_000005	note	SECRETED:SignalP(1-25)
AlAUS0001_000006-T1 AlAUS0001_000006	note	SECRETED:SignalP(1-16)
AlAUS0001_000019-T1 AlAUS0001_000019	note	SECRETED:SignalP(1-28)
AlAUS0001_000024-T1 AlAUS0001_000024	note	SECRETED:SignalP(1-25)
AlAUS0001_000034-T1 AlAUS0001_000034	note	SECRETED:SignalP(1-22)

These are all the logfiles that were generetated, also attached below

Image

funannotate-annotate.log funannotate-predict.log funannotate-p2g.log funannotate-EVM.log busco.log augustus-parallel.log

JWDebler avatar May 26 '25 06:05 JWDebler

that seems like something is unexpected in how signalp is then being run - mine looks like this so only the transcript name is there

ACMYSQ_000040-T1	note	SECRETED:SignalP(1-18)
ACMYSQ_000048-T1	note	SECRETED:SignalP(1-20)
ACMYSQ_000057-T1	note	SECRETED:SignalP(1-21)
ACMYSQ_000092-T1	note	SECRETED:SignalP(1-16)

and my signalp results look like this

more signalp.results.txt
# SignalP-6.0	Organism: Eukarya	Timestamp: 20250422220620
# ID	Prediction	OTHER	SP(Sec/SPI)	CS Position
ACMYSQ_000001-T1 ACMYSQ_000001	OTHER	1.000000	0.000003
ACMYSQ_000002-T1 ACMYSQ_000002	OTHER	1.000000	0.000000
ACMYSQ_000003-T1 ACMYSQ_000003	OTHER	1.000000	0.000005
ACMYSQ_000004-T1 ACMYSQ_000004	OTHER	1.000000	0.000000
ACMYSQ_000005-T1 ACMYSQ_000005	OTHER	1.000000	0.000001
ACMYSQ_000006-T1 ACMYSQ_000006	OTHER	1.000000	0.000000
ACMYSQ_000007-T1 ACMYSQ_000007	OTHER	0.999996	0.000011
ACMYSQ_000008-T1 ACMYSQ_000008	OTHER	0.993794	0.006229

the code is actually confusing to me as I don't see where the gene name is stripped out with split(' ') I would expect @nextgenusfs

https://github.com/nextgenusfs/funannotate/blob/033a883081a83a161798ecc17eaf77b16b5c552b/funannotate/library.py#L7310 https://github.com/nextgenusfs/funannotate/blob/033a883081a83a161798ecc17eaf77b16b5c552b/funannotate/library.py#L7313

hyphaltip avatar Jun 02 '25 18:06 hyphaltip

can you post snippet of annotations.secretome.txt and signalp.results.txt and also annotations.transmembrane.txt I'm trying to also see why there are no TMs. I can't tell if there are any silly line ending problems here that would cause it either.

hyphaltip avatar Jun 02 '25 18:06 hyphaltip

secretome.txt

AlAUS0001_000005-T1 AlAUS0001_000005	note	SECRETED:SignalP(1-25)
AlAUS0001_000006-T1 AlAUS0001_000006	note	SECRETED:SignalP(1-16)
AlAUS0001_000019-T1 AlAUS0001_000019	note	SECRETED:SignalP(1-28)
AlAUS0001_000024-T1 AlAUS0001_000024	note	SECRETED:SignalP(1-25)
AlAUS0001_000034-T1 AlAUS0001_000034	note	SECRETED:SignalP(1-22)
AlAUS0001_000037-T1 AlAUS0001_000037	note	SECRETED:SignalP(1-26)

signalp.results.txt

# ID	Prediction	OTHER	SP(Sec/SPI)	CS Position
AlAUS0001_000001-T1 AlAUS0001_000001	OTHER	1.000000	0.000000	
AlAUS0001_000002-T1 AlAUS0001_000002	OTHER	1.000000	0.000001	
AlAUS0001_000003-T1 AlAUS0001_000003	OTHER	1.000000	0.000000	
AlAUS0001_000004-T1 AlAUS0001_000004	OTHER	1.000000	0.000000	
AlAUS0001_000005-T1 AlAUS0001_000005	SP	0.045362	0.954630	CS pos: 25-26. Pr: 0.8822

I don't have a transmembrane annotations file.

My secretome file does indeed look different thant yours.

JWDebler avatar Jun 03 '25 08:06 JWDebler

there's got to be something messing up the splitting the ID and I don't quite know where this is coming from, can you of course make sure you have the latest version of the code from github installed? I can't tell for sure if anything like this was fixed for signalp6 parsing since the 1.8.17 release...

conda activate funannotate # or however else you have the funannotate env loaded
python -m pip install git+https://github.com/nextgenusfs/funannotate.git

hyphaltip avatar Jun 04 '25 04:06 hyphaltip

the transmembrane domain pred will only happen if phobius is installed. You can check what is or isn't installed with

funannotate check --show-versions

hyphaltip avatar Jun 04 '25 04:06 hyphaltip

OK, installed latest version as per your command above, reran interproscan and annotations step, still same result.

JWDebler avatar Jun 04 '25 22:06 JWDebler

I'll try and look at some code - seems like it needs a different parsing bit in there - you can probably fix this by just doing a simple perl / python one liner to fix the annotations.txt file or the secretome.txt file and re-running the annotate step if you want to try to force this through. I don't have the bandwidth right now to do any code fixes so I'm not sure what else to tell you is the source of your issue.

hyphaltip avatar Jun 06 '25 22:06 hyphaltip