funannotate
funannotate copied to clipboard
Error in funannotate annotate
Hi
I am using funannotate 1.8.7 to annotate a fungal genome.
Before gene annotation, InterProScan, Eggnog-mapper、antiSMASH and SignalP 5.0 were run out side of funannotate.
funannotate iprscan -i /fun/update_results/species.proteins.fa -o iprscan.xml -m local --iprscan_path /opt/biosoft/interproscan-5.45-80.0/interproscan.sh --cpus 20
antismash /hap1_masked.fas --genefinding-gff3 /fun/update_results/species.gff3
emapper.py -i /fun/update_results/species.proteins.fa --output eggnog_diamond -m diamond --cpu 50
signalp -batch 30000 -org euk -fasta /fun/update_results/species.proteins.fa -gff3 -mature
However, when I run below command, the errors came. funannotate annotate -i fun --eggnog ./eggnogout/eggnog_diamond.emapper.annotations --iprscan iprscan.xml --antismash ./antismashout/hap1_masked.gbk --signalp ./signalpout/species.proteins_summary.signalp5 --busco_db /data/database/BUSCO/basidiomycota_odb9 --cpus 20 --strain "species" --isolate GD1913
ERROR
nohup: ignoring input
2 [Aug 26 12:45 PM]: OS: Ubuntu 18.04, 160 cores, ~ 1056 GB RAM. Python: 3.7.10
3 [Aug 26 12:45 PM]: Running 1.8.7
4 [Aug 26 12:45 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--
5 [Aug 26 12:45 PM]: Found existing output directory fun. Warning, will re-use any intermediate files found.
6 [Aug 26 12:45 PM]: Parsing input files
7 [Aug 26 12:45 PM]: Existing tbl found: fun/update_results/ustilago_maydis.tbl
8 [Aug 26 12:49 PM]: Adding Functional Annotation to ustilago_maydis, NCBI accession: None
9 [Aug 26 12:49 PM]: Annotation consists of: 13,839 gene models
10 [Aug 26 12:49 PM]: 12,882 protein records loaded
11 [Aug 26 12:49 PM]: Existing Pfam-A results found: fun/annotate_misc/annotations.pfam.txt
12 [Aug 26 12:49 PM]: 9,033 annotations added
13 [Aug 26 12:49 PM]: Running Diamond blastp search of UniProt DB version 2021_03
14 [Aug 26 12:49 PM]: 410 valid gene/product annotations from 544 total
15 [Aug 26 12:49 PM]: Existing Eggnog-mapper results found: fun/annotate_misc/eggnog.emapper.annotations
16 [Aug 26 12:49 PM]: Parsing EggNog Annotations
17 [Aug 26 12:49 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.70
18 [Aug 26 12:49 PM]: 410 gene name and product description annotations added
19 [Aug 26 12:49 PM]: Existing MEROPS results found: fun/annotate_misc/annotations.merops.txt
20 [Aug 26 12:49 PM]: 280 annotations added
21 [Aug 26 12:49 PM]: Existing CAZYme results found: fun/annotate_misc/annotations.dbCAN.txt
22 [Aug 26 12:49 PM]: 255 annotations added
23 [Aug 26 12:49 PM]: Existing BUSCO2 results found: fun/annotate_misc/annotations.busco.txt
24 [Aug 26 12:49 PM]: 1,262 annotations added
25 [Aug 26 12:49 PM]: Existing Phobius results found: fun/annotate_misc/phobius.results.txt
26 [Aug 26 12:49 PM]: Existing SignalP results found: fun/annotate_misc/signalp.results.txt
27 Traceback (most recent call last):
28 File "/data/Liangjunmin/opt/biosoft/miniconda3_for_pb-assembly/envs/funannotate/bin/funannotate", line 10, in
Thanks
You should not run antiSMASH with GFF -- use the GBK file from predict_results and turn off gene-finding. And then run signalP like this: signalp -stdout -org euk -format short -fasta proteins.genome > signalp.out
. But its much easier to just let funannotate run signalP as it will run it multiprocessing by splitting input and then ensure your format is correct.....
Hi Jon, Thanks for your instant reply. Sorry I can not use gbk file to run antiSMASH since there are several alternative splicing for each gene and the error will come up as "multiple CDS features have the same name for mapping". Finally I used "antismash genome.fasta --genefinding-gff3 /fun/update_results/species.gff3"
For signalp, how to run signalp in Funannotate? Sorry I didn't find the related documentation. I tried signalp as you suggested. I seemed the results is the same as those obtained by "signalp -batch 30000 -org euk -fasta /fun/update_results/species.proteins.fa -gff3 -mature" obtained from your way 1 # SignalP-5.0 Organism: euk Timestamp: 20210825205549 2 # ID Prediction SP(Sec/SPI) OTHER CS Position 3 FUN_000002-T1 OTHER 0.018607 0.981393 4 FUN_000003-T1 OTHER 0.000467 0.999533 5 FUN_000004-T1 OTHER 0.000502 0.999498 6 FUN_000004-T2 OTHER 0.000440 0.999560 7 FUN_000007-T1 OTHER 0.001157 0.998843 8 FUN_000009-T1 OTHER 0.001537 0.998463 9 FUN_000012-T1 OTHER 0.000786 0.999214 10 FUN_000012-T2 OTHER 0.000786 0.999214 11 FUN_000012-T3 OTHER 0.000786 0.999214 12 FUN_000013-T1 OTHER 0.000579 0.999421 13 FUN_000015-T1 OTHER 0.002556 0.997444 14 FUN_000017-T1 OTHER 0.002832 0.997168 15 FUN_000019-T1 OTHER 0.001105 0.998895 16 FUN_000020-T1 OTHER 0.000495 0.999505 17 FUN_000025-T1 OTHER 0.001763 0.998237 18 FUN_000028-T1 OTHER 0.007864 0.992136 19 FUN_000029-T1 OTHER 0.001680 0.998320 20 FUN_000030-T1 OTHER 0.001432 0.998568 21 FUN_000031-T1 OTHER 0.010161 0.989839 22 FUN_000032-T1 OTHER 0.004603 0.995397 23 FUN_000033-T1 OTHER 0.001979 0.998021 24 FUN_000034-T1 OTHER 0.001697 0.998303 25 FUN_000035-T1 OTHER 0.001207 0.998793 26 FUN_000036-T1 OTHER 0.014808 0.985192 27 FUN_000037-T1 OTHER 0.003916 0.996084 28 FUN_000038-T1 OTHER 0.002448 0.997552 ...... obtained from your way 1 # SignalP-5.0 Organism: euk Timestamp: 20210830075525 2 # ID Prediction SP(Sec/SPI) OTHER CS Position 3 FUN_000002-T1 OTHER 0.018607 0.981393 4 FUN_000003-T1 OTHER 0.000467 0.999533 5 FUN_000004-T1 OTHER 0.000502 0.999498 6 FUN_000004-T2 OTHER 0.000440 0.999560 7 FUN_000007-T1 OTHER 0.001157 0.998843 8 FUN_000009-T1 OTHER 0.001537 0.998463 9 FUN_000012-T1 OTHER 0.000786 0.999214 10 FUN_000012-T2 OTHER 0.000786 0.999214 11 FUN_000012-T3 OTHER 0.000786 0.999214 12 FUN_000013-T1 OTHER 0.000579 0.999421 13 FUN_000015-T1 OTHER 0.002556 0.997444 14 FUN_000017-T1 OTHER 0.002832 0.997168 15 FUN_000019-T1 OTHER 0.001105 0.998895 16 FUN_000020-T1 OTHER 0.000495 0.999505 17 FUN_000025-T1 OTHER 0.001763 0.998237 18 FUN_000028-T1 OTHER 0.007864 0.992136 19 FUN_000029-T1 OTHER 0.001680 0.998320 20 FUN_000030-T1 OTHER 0.001432 0.998568 21 FUN_000031-T1 OTHER 0.010161 0.989839 22 FUN_000032-T1 OTHER 0.004603 0.995397 23 FUN_000033-T1 OTHER 0.001979 0.998021 24 FUN_000034-T1 OTHER 0.001697 0.998303 25 FUN_000035-T1 OTHER 0.001207 0.998793 26 FUN_000036-T1 OTHER 0.014808 0.985192 27 FUN_000037-T1 OTHER 0.003916 0.996084 28 FUN_000038-T1 OTHER 0.002448 0.997552 Thanks.
Per signalP is expecting the results to be tab-delimited -- are they not tab delimited for some reason? I do not have license for signalP > 4.1, so I've never had an actual copy in my hand but wrote the parser based off of user feedback.
antismash error -- interesting I've def run multi-transcript genomes through with Genbank, perhaps its a new version. But great if the GFF now works with newest antismash.
Hi Jon, Could u give me an example for the signalp result used for fuannotate annotate?
I can only give an example from v4.1... but the output looks correct for v5 except the parser is expecting tab delimited.... so if your file has spaces and not tabs that is the problem and I'll need to update the parser to just split on spaces if tabs aren't found.
I didn't know how to resolve it finally. I just remove --signalp from the funannotate annotate and try to merge the signalp results manually. Thanks.
So you were unable to tell if signalP 5.0 on your computer was generating tab delimited or space delimited output? All data I had seen was tab delimited-- my guess is your particular version is outputting space delimited output which is why the parser is failing. I can fix it, I just need the answer to that question.
Sorry,I may misunderstand your mean. Please see attachment to find the outputs of Signalp5 separately outside.
The command I used for signalp5 was signalp -batch 30000 -org euk -format short -fasta proteins.fa -gff3 -mature
Thanks for your generous help.
Junmin Liang State Key Laboratory of Mycology, Institute of Microbiology Chinese Academy of Sciences
No.1 Beichen West Road, Chaoyang District, Beijing, P. R. China 100101
On 9/3/2021 10:23,Jon @.***> wrote:
So you were unable to tell if signalP 5.0 on your computer was generating tab delimited or space delimited output? All data I had seen was tab delimited-- my guess is your particular version is outputting space delimited output which is why the parser is failing. I can fix it, I just need the answer to that question.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
I cannot tell how the columns in your file are delimited when you paste it into GitHub. Either attach the signalP output file or open in a text editor and check if there are spaces in between the columns or tabs.