funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

Issue parsing iprscan XML file

Open Cat4Lion opened this issue 1 year ago • 4 comments

Are you using the latest release? 1.8.10 - singularity install. Can try another install, but just checking that this isn't a simple version mis-match issue.

Describe the bug trouble with parsing iprscan xml file

What command did you issue?

Running funannotate [02/20/24 21:39:51]: /venv/bin/funannotate annotate --input predict_results --iprscan Fv_Wa1orig_interpro.xml --eggnog Fv_WA1orig.eggnog.annotations --antismash Fusarium_virguliforme_WA1orig.antismash.gbk --phobius Fv_WA1orig_phobious_out.txt --rename SAY83 --sbt Fv_template.sbt --species Fusarium virguliforme --isolate WA1 --cpus 8 --busco_db dikarya

Logfiles Runs through all other files, relevant CMD error from parsing XML from interproscan-5.60-92.0:

[02/20/24 21:48:57]: Parsing InterProScan5 XML file [02/20/24 21:48:57]: /venv/bin/python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/iprscan2annotations.py /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/iprscan.xml /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/annotations.iprscan.txt [02/20/24 21:52:51]: CMD ERROR: /venv/bin/python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/iprscan2annotations.py /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/iprscan.xml /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/annotations.iprscan.txt [02/20/24 21:52:51]: Error parsing XML GO terms: None is not a valid term

OS/Install Information

  • output of funannotate check --show-versions Checking dependencies for 1.8.10

You are running Python v 3.8.12. Now checking python packages... biopython: 1.77 goatools: 1.2.3 matplotlib: 3.5.1 natsort: 8.1.0 numpy: 1.22.3 pandas: 1.4.2 psutil: 5.9.0 requests: 2.27.1 scikit-learn: 1.0.2 scipy: 1.5.3 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000024 threads: 2.15 threads::shared: 1.56 ERROR: Bio::Perl not installed, install with cpanm Bio::Perl

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/venv/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.9.1-internal kallisto: 0.46.1 mafft: v7.505 (2022/Apr/10) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.6 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 31 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed

Cat4Lion avatar Feb 21 '24 03:02 Cat4Lion

would help to have a copy of the XML file causing error

hyphaltip avatar Feb 21 '24 19:02 hyphaltip

It's way too big, but here are annotations for the first gene, maybe just to check it's not a file format conflict issue. Thanks.

iprscan_sample.txt

Cat4Lion avatar Feb 22 '24 02:02 Cat4Lion

Hi @Cat4Lion, so it seems you are running a development version, ie 1.8.10 -- all of the even numbered are not releases but rather at an intermediate step. The error isn't actually in your XML file, but rather parsing the GO obo file and must be some ontology term that it was not expecting. When I look at our current code this error shouldn't be displayed, but rather default to GO terminology of go_unknown. I'm going to tag a new bug fix release shortly, but my suggestion would be to try the latest codebase and let us know if you still are having issues. Not sure how the singularity image was built, but if it was based off of the docker image, the latest tag should be current.

nextgenusfs avatar Feb 28 '24 19:02 nextgenusfs