funannotate
funannotate copied to clipboard
Bug in how antiSMASH results are handled?
funannotate v1.8.12
Funannotate annotate
failed when I included the argument --antismash /path/to/annotate_misc/antiSMASH.results.gbk
because the file did not exist. I dug into it and discovered that annotate was removing the file from annotate_misc. Is this a bug?
As far as I can tell, funannotate remote
creates a file antiSMASH.results.gbk in annotate_misc (from remote.py):
298 # now grab the GBK files from folder as you will need just that for annotation, place in annotate_misc folder for auto-detection
299 anti_GBK = os.path.join(outputdir, jobid, os.path.basename(genbank))
300 final = os.path.join(outputdir, 'annotate_misc',
301 'antiSMASH.results.gbk')
302 shutil.copyfile(anti_GBK, final)
Then,funannotate annotate
deletes this file without using it (from annotate.py):
1079 # check if antiSMASH data is given, if so parse and reformat for annotations and cluster textual output
1080 antismash_input = os.path.join(
1081 outputdir, 'annotate_misc', 'antiSMASH.results.gbk')
1082 if args.antismash:
1083 if os.path.isfile(antismash_input):
1084 os.remove(antismash_input)
1085 shutil.copyfile(args.antismash, antismash_input)
args.antismash
is from the command line option, it would be False if nothing was passed. So you don't have to pass the GBK file via the --antismash option, if the file is there it will be parsed (ie file will be put there from remote). otherwise it will use the file that exists, ie so if you pass an file there perhaps you re-ran it and it should not use existing file in that location.
Ok—don't use --antismash if I just ran antismash with remote, got it.
With that I am almost to the end of the pipeline ("writing genome annotation table!"), but there is one more error I have no idea what to do with... Seems to suggest that there is an "E" in the nucleotide CDS sequence, but I checked and there is not. Is there something else that could cause this?
[Aug 11 11:21 PM]: OS: CentOS Stream 8, 46 cores, ~ 230 GB RAM. Python: 3.8.12 [Aug 11 11:21 PM]: Running 1.8.12 [Aug 11 11:21 PM]: Found existing output directory annotated_2022-08-10. Warning, will re-use any intermediate files found. [Aug 11 11:21 PM]: Parsing input files [Aug 11 11:21 PM]: Existing tbl found: annotated_2022-08-10/predict_results/Neophaeomoniellasp11164_11164.tbl [Aug 11 11:21 PM]: Adding Functional Annotation to Neophaeomoniellasp11164, NCBI accession: None [Aug 11 11:21 PM]: Annotation consists of: 10,251 gene models [Aug 11 11:21 PM]: 10,210 protein records loaded [Aug 11 11:21 PM]: Existing Pfam-A results found: annotated_2022-08-10/annotate_misc/annotations.pfam.txt [Aug 11 11:21 PM]: 12,247 annotations added [Aug 11 11:21 PM]: Running Diamond blastp search of UniProt DB version 2022_02 [Aug 11 11:21 PM]: 640 valid gene/product annotations from 986 total [Aug 11 11:21 PM]: Existing Eggnog-mapper results found: annotated_2022-08-10/annotate_misc/eggnog.emapper.annotations [Aug 11 11:21 PM]: Parsing EggNog Annotations [Aug 11 11:21 PM]: EggNog version parsed as 2.1.9 [Aug 11 11:21 PM]: 19,792 COG and EggNog annotations added [Aug 11 11:21 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.79 [Aug 11 11:21 PM]: 2,670 gene name and product description annotations added [Aug 11 11:21 PM]: Existing MEROPS results found: annotated_2022-08-10/annotate_misc/annotations.merops.txt [Aug 11 11:21 PM]: 368 annotations added [Aug 11 11:21 PM]: Existing CAZYme results found: annotated_2022-08-10/annotate_misc/annotations.dbCAN.txt [Aug 11 11:21 PM]: 456 annotations added [Aug 11 11:21 PM]: Existing BUSCO2 results found: annotated_2022-08-10/annotate_misc/annotations.busco.txt [Aug 11 11:21 PM]: 1,274 annotations added [Aug 11 11:21 PM]: Existing Phobius results found: annotated_2022-08-10/annotate_misc/phobius.results.txt [Aug 11 11:21 PM]: Existing SignalP results found: annotated_2022-08-10/annotate_misc/signalp.results.txt [Aug 11 11:21 PM]: 711 secretome and 2,373 transmembane annotations added [Aug 11 11:21 PM]: Parsing InterProScan5 XML file [Aug 11 11:24 PM]: Now parsing antiSMASH v6 results, finding SM clusters [Aug 11 11:24 PM]: Found 32 clusters, 76 biosynthetic enyzmes, and 124 smCOGs predicted by antiSMASH [Aug 11 11:24 PM]: Found 0 duplicated annotations, adding 85,689 valid annotations [Aug 11 11:24 PM]: Converting to final Genbank format, good luck! [Aug 11 11:25 PM]: Creating AGP file and corresponding contigs file [Aug 11 11:25 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4 [Aug 11 11:26 PM]: Creating tab-delimited SM cluster output [Aug 11 11:26 PM]: Writing genome annotation table.
Traceback (most recent call last):
File "/hpc/home/idm7/miniconda3/envs/funannotate/bin/funannotate", line 8, in
I have not seen this before, and yes it does suggest there is an E in the CDS transcript somehow. It should be writing that file line by line so if you look at the last record it wrote it should be the gene immediately after that causing the problem. Unfortunately you can't likely see the dictionary value it's trying to reverse complement. The data are loaded/parsed from the tbl file I believe and are actually all written again in annotate_results should also be a cds transcript file. So doesn't make a lot of sense if it wrote it correctly there but then failed later. You'd have to put a try/except to catch that error and then output the offending record.
Took a look at the cds transcript file in annotate_results as you suggested, and... bizarrely, it's doing the thing from #763 again — printing the same sequence to the output over and over. When I ran annotate
again without --rename
, everything completed without errors.
Weird. Perhaps try to install the latest release?
This turned out to be an error in how I was providing the locus tag prefix to funannotate; because of a poorly constructed table file, there was a hidden newline character in my variable $locus_tag that was messing everything else up. Closing since fixing the table file solved the problem.