funannotate ERROR: GBK file conversion failed, tbl2asn parallel script has died

Hello again ;_;

I ran into another issue

Thanks in advance!

Are you using the latest release? v1.8.8

Describe the bug A little unsure about this

output error:

ERROR: GBK file conversion failed, tbl2asn parallel script has died

Tried running the command outside of the funannotate predict workflow and the initial issue was a core dump. I requested more memory and got around the core dump issue

From predict log files:

[05/24/21 15:02:27]: Found 541 gene models to remove: 4 too short; 0 span gaps; 537 transposable elements
[05/24/21 15:02:27]: 21,702 gene models remaining
[05/24/21 15:02:27]: Predicting tRNAs
[05/24/21 15:02:28]: bedtools intersect -sorted -v -a /u/project/kruglyak/thatguy0/genomics/rnaseq/trinity_out_dir_genome_assisted/funannotate_run/fun_predict/predict_misc/trnascan.gff3.sorted.gff3 -b /u/project/kruglyak/thatguy0/genomics/rnaseq/trinity_out_dir_genome_assisted/funannotate_run/fun_predict/predict_misc/evm.cleaned.gff3.sorted.gff3 /u/project/kruglyak/thatguy0/genomics/rnaseq/trinity_out_dir_genome_assisted/funannotate_run/fun_predict/predict_misc/assembly-gaps.bed.sorted.gff3
[05/24/21 15:02:29]: 435 tRNAscan models are valid (non-overlapping)
[05/24/21 15:02:29]: Generating GenBank tbl annotation file
[05/24/21 15:02:44]: Converting to final Genbank format
[05/24/21 15:02:44]: /u/project/kruglyak/thatguy0/conda/envs/funannotate/bin/python /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i ./predict_misc/tbl2asn/genome.tbl -f /u/project/kruglyak/thatguy0/genomics/rnaseq/trinity_out_dir_genome_assisted/funannotate_run/fun_predict/predict_misc/genome.softmasked.fa -o ./predict_misc/tbl2asn --sbt /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/config/test.sbt -d ./predict_results/caenorhabditis_XZ1516.discrepency.report.txt -s caenorhabditis -t -l paired-ends -v 1 -c 1 --isolate XZ1516
[05/24/21 15:02:48]: ERROR: GBK file conversion failed, tbl2asn parallel script has died

What command did you issue? Same as the issue from earlier today, but with --keep_evm flag:

funannotate predict -i ../XZ1516_renamed.fasta \
	--species "caenorhabditis" \
	--isolate XZ1516 \
	--transcript_evidence ../fun_XZ1516/training/trinity.fasta.clean \
	--rna_bam ../Aligned.sortedByCoord.out.bam \
	--busco_db nematoda \
	--cpus 8 \
	--organism other \
	--busco_seed_species caenorhabditis \
	-o . \
	--keep_evm

This fails as outlined above, but I can successfully run the line causing the issue outside of the predict workflow:

/u/project/kruglyak/thatguy0/conda/envs/funannotate/bin/python /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i ./predict_misc/tbl2asn/genome.tbl -f /u/project/kruglyak/thatguy0/genomics/rnaseq/trinity_out_dir_genome_assisted/funannotate_run/fun_predict/predict_misc/genome.softmasked.fa -o ./predict_misc/tbl2asn --sbt /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/config/test.sbt -d ./predict_results/caenorhabditis_XZ1516.discrepency.report.txt -s caenorhabditis -t -l paired-ends -v 1 -c 1 --isolate XZ1516

A couple of things about that line:

I'm unsure about the sbt file, this appears to be a default --sbt /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/config/test.sbt
the .val and .gbf files in the predict_misc/tbl2as are both empty

Logfiles logfiles.zip

OS/Install Information

-------------------------------------------------------
Checking dependencies for 1.8.8
-------------------------------------------------------
You are running Python v 3.7.10. Now checking python packages...
biopython: 1.78
goatools: 1.0.15
matplotlib: 3.4.2
natsort: 7.1.1
numpy: 1.20.2
pandas: 1.2.4
psutil: 5.8.0
requests: 2.25.1
scikit-learn: 0.24.2
scipy: 1.6.3
seaborn: 0.11.1
All 11 python packages installed


You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.007002
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.302
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.13
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.30
threads: 2.21
threads::shared: 1.56
All 27 Perl modules installed


Checking Environmental Variables...
$FUNANNOTATE_DB=/u/project/kruglyak/thatguy0/funannotate_db
$PASAHOME=/u/project/kruglyak/thatguy0/conda/envs/funannotate/opt/pasa-2.4.1
$TRINITY_HOME=/u/project/kruglyak/thatguy0/conda/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/u/project/kruglyak/thatguy0/conda/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/u/project/kruglyak/thatguy0/conda/envs/funannotate/config/
$GENEMARK_PATH=/u/project/kruglyak/thatguy0/bin/gmes_linux_64/
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
	ERROR: signalp found but error running signalp
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.8
emapper.py: 2.1.2
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
gmes_petap.pl: 4.65_lic
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.1
kallisto: 0.46.1
mafft: v7.475 (2020/Nov/23)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.18-r1015
proteinortho: 6.0.30
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.10
snap: 2006-07-28
stringtie: 2.1.5
tRNAscan-SE: 2.0.7 (Oct 2020)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
	ERROR: signalp not installed

May 24 '21 22:05 Thatguy027

So sounds like a memory issue then? Maybe I misunderstood, but seemed to suggest you could manually run this following command and it executed successfully?:

/u/project/kruglyak/thatguy0/conda/envs/funannotate/bin/python /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py \
    -i ./predict_misc/tbl2asn/genome.tbl \
    -f /u/project/kruglyak/thatguy0/genomics/rnaseq/trinity_out_dir_genome_assisted/funannotate_run/fun_predict/predict_misc/genome.softmasked.fa \
    -o ./predict_misc/tbl2asn \
    --sbt /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/config/test.sbt \
    -d ./predict_results/caenorhabditis_XZ1516.discrepency.report.txt \
    -s caenorhabditis \
    -t -l paired-ends -v 1 \
    -c 1 --isolate XZ1516

How you have this written is it will use 1 cpu, ie -c 1, which apparently then is working? But getting perhaps a malloc error when it was run in funannotate predict -c 8. The point of this wrapper was to speed up tbl2asn, I have no idea how much memory it uses, but the wrapper is launching parallel processes by chunking the input.

You could try to wrap the command in /usr/bin/time -v to get the peak memory used to see if it is requesting too much. I guess you must not have swap enabled that it would die?

May 24 '21 23:05 nextgenusfs

It was a memory issue at first, but I think I got around that issue (I probably shouldn't have mentioned it, sorry for the confusion)

To clarify, funannotate predict is failing at the tbl2asn step. I mentioned the core dump because that was the first issue I ran into when trying to run the last attempted log command outside of the predict workflow.

The command you pasted technically works, in that there are no errors, but the .val and .gbf files are empty and so is the discrepancy report.

When trying to complete the pipeline, I changed the funannotate predict --cpus to 1, but didn't fix anything.

If it is useful, I think I have some issues with GLIBC in the environment because tbl2asn throws an error. Not sure how useful this is because GLIBC is included in the funannotate conda install from what i can tell

(funannotate) [thatguy0@n6459 fun_predict]$ tbl2asn
tbl2asn: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by tbl2asn)
(funannotate) [thatguy0@n6459 fun_predict]$ /u/project/kruglyak/thatguy0/conda/envs/funannotate/bin/python /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py \
> -i ./predict_misc/tbl2asn/genome.tbl \
> -f /u/project/kruglyak/thatguy0/genomics/rnaseq/XZ1516_GZ_B5_A8/trinity_out_dir_genome_assisted/funannotate_run/fun_predict/predict_misc/genome.softmasked.fa \
> -o ./predict_misc/tbl2asn \
> --sbt /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/config/test.sbt \
> -d ./predict_results/caenorhabditis_XZ1516.discrepency.report.txt -s caenorhabditis -t TBL2ASN -v 1 -c 1 --isolate XZ1516
(funannotate) [thatguy0@n6459 fun_predict]$

May 24 '21 23:05 Thatguy027

Okay, I guess I would try to re-install tbl2asn or maybe even download the binary from NCBI and remove the one from bioconda for example.

May 24 '21 23:05 nextgenusfs

yes exactly - we use a system installed one not bioconda one.

May 24 '21 23:05 hyphaltip

You can "force" remove dependencies with conda with the --force flag (I think):

conda remove -n funannotate tbl2asn --force

That should just remove the tbl2asn from that environment and then add the system installed one into your PATH.

May 24 '21 23:05 nextgenusfs

Just tried all of your suggestions with no luck.

A couple of questions to try to get around this issue:

what are the expected outputs when tbl2asn is done?
what is the tbl2asn command that is actually being run (outside of the wrapper), maybe i can try seeing what error that throws?

Thanks again

May 24 '21 23:05 Thatguy027

In its simplest form it would be something like:

tbl2asn -t /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/config/test.sbt \
   -M n -V b -c f -a r10u -p predict_misc/tbl2asn

the tbl2asn folder should contain genome.fsa (genome) and genome.tbl (the annotation).

May 24 '21 23:05 nextgenusfs

Ah, then it sounds like tbl2asn is working fine because those files were generated...

May 25 '21 00:05 Thatguy027

Sorry I wasn't clear - the genome and annotation files are the input. You can essentially run the command I entered manually and it should hopefully yield an interpretable error.

Output files will be generated in that same folder, ie the genome.gbf would be the genbank flatfile. But there will be several other submission files generated.

May 25 '21 00:05 nextgenusfs

Annoyingly works like a charm:

(funannotate) [thatguy0@n6459 fun_predict]$ tbl2asn -t /u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/config/test.sbt \
>    -M n -V b -c f -a r10u -p predict_misc/tbl2asn
[tbl2asn] Flatfile genome

[tbl2asn] Validating genome
(funannotate) [thatguy0@n6459 fun_predict]$

So I think the issue is GLIBC because I needed to load the module from my cluster's install to make tbl2asn work. However, when the module is loaded Loading genome assembly and parsing soft-masked repetitive sequences fails. I had some issues with GLIBC originally when installing dependencies, but thought I solved the issue...seems like I didn't?

Does tbl2asn need to be run within the predict workflow or can the output i generated with the line above be read in when I try to relaunch (testing this now)?

May 25 '21 00:05 Thatguy027

I guess if it works interactively but not when you submit job to cluster than sounds like your GLIBC error is related to order in which modules are loaded or something associated with activating that environment. One of the last steps is to write the GBK output files (ie tbl2asn), so after that step the scripts reformat a few things and move files around in the proper places so they can be parsed properly by funannotate update or funannotate annotate, etc. Ideally would be great if it worked as intended.

May 25 '21 18:05 nextgenusfs

I don't load any modules from the cluster apart from glibc for this particular step.

Agreed, would be awesome if it worked in my hands ;_;

I am looking at the predict.py script and it's SO close to finishing. Please correct me if I'm wrong, but this seems like all that my run is missing at the moment:

shutil.copyfile(os.path.join(gag3dir, 'genome.gbf'), final_gbk)
shutil.copyfile(os.path.join(gag3dir, 'genome.tbl'), final_tbl)
shutil.copyfile(os.path.join(gag3dir, 'genome.val'), final_validation)
shutil.copyfile(os.path.join(gag3dir, 'errorsummary.val'), final_error)
lib.tbl2allout(final_tbl, MaskGenome, final_gff, final_proteins,
               final_transcripts, final_cds_transcripts, final_fasta)
lib.annotation_summary(MaskGenome, final_stats, tbl=final_tbl,
                       transcripts=Transcripts, proteins=Exonerate,
                       database=FUNDB, command=' '.join(sys.argv),
                       organism=organism_name)
total = lib.countGFFgenes(final_gff)    lib.log.info("Collecting final annotation files for {:,} total gene models".format(total))
lib.log.info("Funannotate predict is finished, output files are in the %s/predict_results folder" % (args.out))

This is probably out of the scope of an issue page, but is it possible for you to let me know which files to move over the results folder so I can utilize your downstream utilities? This is what is in the misc folder (all not listed, just since EVM output):

(funannotate) [thatguy0@n6459 fun_predict]$ ls -lat predict_misc/
total 1247716
drwxr-xr-x   4 thatguy0 kruglyak      4096 May 25 19:26 tbl2asn
drwxr-xr-x   5 thatguy0 kruglyak      4096 May 25 18:57 ..
drwxr-xr-x  11 thatguy0 kruglyak     12288 May 25 00:57 .
-rw-r--r--   1 thatguy0 kruglyak    121981 May 25 00:57 trnascan.no-overlaps.gff3
-rw-r--r--   1 thatguy0 kruglyak         0 May 25 00:57 assembly-gaps.bed.sorted.gff3
-rw-r--r--   1 thatguy0 kruglyak  31268809 May 25 00:57 evm.cleaned.gff3.sorted.gff3
-rw-r--r--   1 thatguy0 kruglyak    210770 May 25 00:57 trnascan.gff3.sorted.gff3
-rw-r--r--   1 thatguy0 kruglyak  31290510 May 25 00:57 evm.cleaned.gff3
-rw-r--r--   1 thatguy0 kruglyak    628883 May 25 00:57 bad_models.gff
-rw-r--r--   1 thatguy0 kruglyak    333377 May 25 00:57 genome.repeats.to.remove.gff
-rw-r--r--   1 thatguy0 kruglyak  33529503 May 25 00:57 evm.round1.gff3.sorted.gff
-rw-r--r--   1 thatguy0 kruglyak   2850230 May 25 00:57 repeatmasker.bed.sorted.bed
-rw-r--r--   1 thatguy0 kruglyak     33539 May 25 00:57 repeat.gene.models.txt
-rw-r--r--   1 thatguy0 kruglyak  17443804 May 25 00:57 repeats.xml
-rw-r--r--   1 thatguy0 kruglyak  10019816 May 25 00:56 evm.round1.proteins.fa
-rw-r--r--   1 thatguy0 kruglyak       185 May 25 00:56 weights.evm.txt
-rw-r--r--   1 thatguy0 kruglyak 122591758 May 25 00:56 gene_predictions.gff3
-rw-r--r--   1 thatguy0 kruglyak  23728480 May 25 00:56 augustus.evm.gff3
-rw-r--r--   1 thatguy0 kruglyak  24381370 May 25 00:56 augustus.evm.gff3.bak
-rw-r--r--   1 thatguy0 kruglyak  29363196 May 25 00:56 genemark.evm.gff3
-rw-r--r--   1 thatguy0 kruglyak  29363196 May 25 00:56 genemark.evm.gff3.bak
-rw-r--r--   1 thatguy0 kruglyak  29363196 May 25 00:56 genemark.temp.gff
-rw-r--r--   1 thatguy0 kruglyak  20369699 May 25 00:56 hints.ALL.gff
-rw-r--r--   1 thatguy0 kruglyak  27568706 May 25 00:56 hints.all.sort.tmp
-rw-r--r--   1 thatguy0 kruglyak  27568706 May 25 00:55 hints.all.tmp
-rw-r--r--   1 thatguy0 kruglyak   5754265 May 25 00:55 hints.P.gff
-rw-r--r--   1 thatguy0 kruglyak 210339823 May 25 00:55 proteins.combined.fa
-rw-r--r--   1 thatguy0 kruglyak 107111469 May 25 00:55 genome.softmasked.fa
-rw-r--r--   1 thatguy0 kruglyak         0 May 25 00:55 assembly-gaps.bed
-rw-r--r--   1 thatguy0 kruglyak   2850230 May 25 00:55 repeatmasker.bed
drwxr-xr-x   2 thatguy0 kruglyak      4096 May 24 21:53 1
-rw-r--r--   1 thatguy0 kruglyak  33551746 May 24 20:47 evm.round1.gff3
drwxr-xr-x 132 thatguy0 kruglyak     16384 May 24 20:11 EVM
-rw-r--r--   1 thatguy0 kruglyak    210786 May 24 20:02 trnascan.gff3
-rw-r--r--   1 thatguy0 kruglyak     38488 May 24 20:02 tRNAscan.len-filtered.out
-rw-r--r--   1 thatguy0 kruglyak     38548 May 24 20:02 tRNAscan.out

May 25 '21 19:05 Thatguy027

I can tell you where stuff should go -- but the problem is that this same step is at the end of update/annotate scripts as well. I could modify the code so it doesn't completely fail at this step, ie tbl2asn fails but it is kind of an important step as that is what generates all of the data for NCBI submission.

I wonder if it is the genome size, if you run funannotate test -t predict --cpus X does that also fail at the same step?

May 25 '21 19:05 nextgenusfs

Hi @Thatguy027, I reorganized the last few steps there so all files should be in proper place before tbl2asn dies. I also added a step to try to run tbl2asn single threaded as a backup, I'm not sure if this will solve your problem or not. But would be great if you could test and provide some feedback, update with pip:

python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --no-deps --force

May 25 '21 22:05 nextgenusfs

Awesome, I will def test it out.

The test run you suggested failed in my hands at the tbl2asn, though with some additional messages:

[May 25 02:21 PM]: ERROR: GBK file conversion failed, tbl2asn parallel script has died
#########################################################
Traceback (most recent call last):
  File "/u/project/kruglyak/thatguy0/conda/envs/funannotate/bin/funannotate", line 8, in <module>
    sys.exit(main())
  File "/u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/funannotate.py", line 705, in main
    mod.main(arguments)
  File "/u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/test.py", line 405, in main
    runPredictTest(args)
  File "/u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/test.py", line 161, in runPredictTest
    tmpdir, 'annotate', 'predict_results', 'Awesome_testicus.gff3')) <= 1800
  File "/u/project/kruglyak/thatguy0/conda/envs/funannotate/lib/python3.7/site-packages/funannotate/test.py", line 45, in countGFFgenes
    with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_94c26190-b23d-41b9-afd3-2fdafe355b95/annotate/predict_results/Awesome_testicus.gff3'

May 25 '21 22:05 Thatguy027

Okay, that is just the wrapper for the test function that is failing -- several layers of python there make the error not very intelligible. So its dying with the same error related to a non-functional tbl2asnbinary or some yet to be determined error in the tbl2asn_parallel.py script.

May 25 '21 22:05 nextgenusfs

Hi @nextgenusfs Ihope you are well, is there any follow up on this error ? I'm also having same issue on the docker and conda version. If which possible is there an older funannotate version that i can use to avoid this error?

Dec 01 '22 21:12 kalonji08

I cannot replicate this with docker image.

$ docker images
REPOSITORY                     TAG       IMAGE ID       CREATED         SIZE
nextgenusfs/funannotate        latest    a8fe0e20b963   4 days ago      12.6GB

$ ./funannotate-docker test -t predict --cpus 4
#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 4 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Dec 01 02:05 PM]: OS: Debian GNU/Linux 10, 4 cores, ~ 8 GB RAM. Python: 3.8.12
[Dec 01 02:05 PM]: Running funannotate v1.8.14
[Dec 01 02:05 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[Dec 01 02:05 PM]: Skipping CodingQuarry as no --rna_bam passed
[Dec 01 02:05 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  glimmerhmm   busco          
  snap         busco          
[Dec 01 02:05 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Dec 01 02:05 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Dec 01 02:05 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Dec 01 02:05 PM]: Found 1,505 preliminary alignments with diamond in 0:00:03 --> generated FASTA files for exonerate in 0:00:00
     Progress: 1505 complete, 0 failed, 0 remaining          
[Dec 01 02:06 PM]: Exonerate finished in 0:00:33: found 1,270 alignments
[Dec 01 02:06 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Dec 01 02:18 PM]: 373 valid BUSCO predictions found, validating protein sequences
[Dec 01 02:19 PM]: 370 BUSCO predictions validated
[Dec 01 02:19 PM]: Running Augustus gene prediction using saccharomyces parameters
     Progress: 11 complete, 0 failed, 0 remaining        
[Dec 01 02:21 PM]: 1,485 predictions from Augustus
[Dec 01 02:21 PM]: Pulling out high quality Augustus predictions
[Dec 01 02:21 PM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Dec 01 02:21 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Dec 01 02:24 PM]: 1,493 predictions from SNAP
[Dec 01 02:24 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Dec 01 02:28 PM]: 1,770 predictions from GlimmerHMM
[Dec 01 02:28 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GlimmerHMM     1        1770 
  snap           1        1493 
  Total          -        4960 
[Dec 01 02:28 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
     Progress: 48 complete, 0 failed, 0 remaining         
[Dec 01 02:38 PM]: Converting to GFF3 and collecting all EVM results
[Dec 01 02:38 PM]: 1,686 total gene models from EVM
[Dec 01 02:38 PM]: Generating protein fasta files from 1,686 EVM models
[Dec 01 02:38 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Dec 01 02:38 PM]: Found 134 gene models to remove: 0 too short; 0 span gaps; 134 transposable elements
[Dec 01 02:38 PM]: 1,552 gene models remaining
[Dec 01 02:38 PM]: Predicting tRNAs
[Dec 01 02:38 PM]: 112 tRNAscan models are valid (non-overlapping)
[Dec 01 02:38 PM]: Generating GenBank tbl annotation file
[Dec 01 02:38 PM]: Collecting final annotation files for 1,664 total gene models
[Dec 01 02:38 PM]: Converting to final Genbank format
[Dec 01 02:39 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Dec 01 02:39 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate -c 4

Run antiSMASH (optional): 
funannotate remote -i annotate -m antismash -e [email protected]

Annotate Genome: 
funannotate annotate -i annotate --cpus 4 --sbt yourSBTfile.txt
-------------------------------------------------------
                
[Dec 01 02:39 PM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[Dec 01 02:39 PM]: Add species parameters to database:

  funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################

Dec 01 '22 22:12 nextgenusfs

Hi @nextgenusfs i also got the same error with funannotate v1.8.13

Here is the code : funannotate annotate -i predict_results/ --eggnog eggNOG_NAG/Cryptococcus_spT15_22C.emapper.annotations --iprscan Cryptococcus_spT15_22C.proteins.fa.xml --cpu 20 --out Annotations

THis is the log file: [12/01/22 22:09:19]: /home/kalonjilab2/miniconda3/envs/funannotate/bin/funannotate annotate -i predict_results/ --eggnog eggNOG_NAG/Cryptococcus_spT15_22C.emapper.annotations --iprscan Cryptococcus_spT15_22C.proteins.fa.xml --cpu 20 --out Annotations

[12/01/22 22:09:19]: OS: Ubuntu 20.04, 40 cores, ~ 99 GB RAM. Python: 3.8.13 [12/01/22 22:09:19]: Running 1.8.13 [12/01/22 22:09:19]: hmmscan version=HMMER 3.3.2 (Nov 2020) path=/home/kalonjilab2/miniconda3/envs/funannotate/bin/hmmscan [12/01/22 22:09:19]: hmmsearch version=HMMER 3.3.2 (Nov 2020) path=/home/kalonjilab2/miniconda3/envs/funannotate/bin/hmmsearch [12/01/22 22:09:19]: diamond version=2.0.15 path=/home/kalonjilab2/miniconda3/envs/funannotate/bin/diamond [12/01/22 22:09:19]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [12/01/22 22:09:19]: Found existing output directory /home/kalonjilab2/Naganishia_Assembly/functional annotation. Warning, will re-use any intermediate files found. [12/01/22 22:09:19]: Parsing input files [12/01/22 22:09:19]: Existing tbl found: /home/kalonjilab2/Naganishia_Assembly/functional annotation/predict_results/Cryptococcus_spT15_22C.tbl [12/01/22 22:09:22]: TBL file: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/genome.tbl [12/01/22 22:09:22]: GFF3 file: /home/kalonjilab2/Naganishia_Assembly/functional annotation/predict_results/Cryptococcus_spT15_22C.gff3 [12/01/22 22:09:22]: Proteins file: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/genome.proteins.fasta [12/01/22 22:09:25]: Adding Functional Annotation to Cryptococcus_spT15_22C, NCBI accession: None [12/01/22 22:09:25]: Annotation consists of: 4,802 gene models [12/01/22 22:09:25]: 4,742 protein records loaded [12/01/22 22:09:25]: Existing Pfam-A results found: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/annotations.pfam.txt [12/01/22 22:09:25]: 7,374 annotations added [12/01/22 22:09:25]: Running Diamond blastp search of UniProt DB version 2022_04 [12/01/22 22:09:26]: 381 valid gene/product annotations from 490 total [12/01/22 22:09:26]: Existing Eggnog-mapper results found: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/eggnog.emapper.annotations [12/01/22 22:09:26]: Parsing EggNog Annotations [12/01/22 22:09:26]: EggNog version parsed as 2.1.9 [12/01/22 22:09:26]: EggNog annotation detected as emapper v2.1.9 and DB prefix ENOG50 [12/01/22 22:09:26]: 9,178 COG and EggNog annotations added [12/01/22 22:09:26]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.84 [12/01/22 22:09:27]: 1,961 gene name and product description annotations added [12/01/22 22:09:27]: Existing MEROPS results found: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/annotations.merops.txt [12/01/22 22:09:27]: 205 annotations added [12/01/22 22:09:27]: Existing CAZYme results found: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/annotations.dbCAN.txt [12/01/22 22:09:27]: 175 annotations added [12/01/22 22:09:27]: Existing BUSCO2 results found: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/annotations.busco.txt [12/01/22 22:09:27]: 1,084 annotations added [12/01/22 22:09:27]: Skipping phobius predictions, try funannotate remote -m phobius [12/01/22 22:09:27]: Skipping secretome: neither SignalP nor Phobius searches were run [12/01/22 22:09:27]: 0 secretome and 0 transmembane annotations added [12/01/22 22:09:27]: Parsing InterProScan5 XML file [12/01/22 22:09:27]: /home/kalonjilab2/miniconda3/envs/funannotate/bin/python /home/kalonjilab2/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/iprscan2annotations.py /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/iprscan.xml /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/annotations.iprscan.txt [12/01/22 22:10:53]: Found 0 duplicated annotations, adding 42,464 valid annotations [12/01/22 22:10:54]: Parsing tbl file: /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/genome.tbl [12/01/22 22:10:54]: Converting to final Genbank format, good luck! [12/01/22 22:10:54]: /home/kalonjilab2/miniconda3/envs/funannotate/bin/python /home/kalonjilab2/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/tbl2asn/genome.tbl -f /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/tbl2asn/genome.fsa -o /home/kalonjilab2/Naganishia_Assembly/functional annotation/annotate_misc/tbl2asn --sbt /home/kalonjilab2/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/config/test.sbt -d discrepency.report.txt -s Cryptococcus_spT15_22C -t -l paired-ends -v 1 -c 20 [12/01/22 22:10:55]: ERROR: GBK file conversion failed, tbl2asn parallel script has died

Dec 02 '22 05:12 kalonji08

Did you update tbl2asn? Probably need to download directly from NCBI. NCBI put a strange timeout error in tbl2asn so if the conda build is > 1 year old it will error out and tell user to upgrade.

Dec 02 '22 06:12 nextgenusfs

Yes i did but i'm still getting the same error Which Funannotate version are you using? if your version is running fine i can maybe install it.

Dec 02 '22 12:12 kalonji08

Hi @nextgenusfs i just wanted to update you i found a way around the issue. I decided to: 1- re-run the prediction with a precalssified augustus species 2- downgrade interpro scan to v 5.52-86.0 as menioned in #830 3- run funnanotate iprscan locally 4- then tried annotate again and it worked

Hopefully something similar will work for those have the same error

Dec 02 '22 21:12 kalonji08

I’m getting the same error (GBK file conversion failed, tbl2asn parallel script has died), but only at the annotate step. Predict worked fine. Mine hangs at "Converting to final Genbank format, good luck!” seemingly indefinitely and spits out the error and quits when I hit enter.

I’m running version 1.8.15, installed with conda

I’ve already got interproscan-version="5.52-86.0” as mentioned by @kalonji08 My tbl2asn version is 25.8 (also NCBI’s latest version)

command:

funannotate annotate -i Can_funannotate \
--iprscan Can_funannotate/annotate_misc/iprscan.xml \
--antismash Can_funannotate/antismash/S*.gbk \
--eggnog Can_funannotate/eggnog.emapper.annotations \
--cpus 5 --sbt ~/Genus/sbt/Spp*.sbt

Log: [05/22/23 10:18:41]: OS: Ubuntu 18.04, 176 cores, ~ 528 GB RAM. Python: 3.8.12 [05/22/23 10:18:41]: Running 1.8.15 [05/22/23 10:18:42]: hmmscan version=HMMER 3.3.2 (Nov 2020) path=/home/ldapusers/janneke.aylward/anaconda3 /envs/funannotate/bin/hmmscan [05/22/23 10:18:42]: hmmsearch version=HMMER 3.3.2 (Nov 2020) path=/home/ldapusers/janneke.aylward/anacond a3/envs/funannotate/bin/hmmsearch [05/22/23 10:18:42]: diamond version=2.0.15 path=/home/ldapusers/janneke.aylward/anaconda3/envs/funannotat e/bin/diamond [05/22/23 10:18:42]: bedtools version=bedtools v2.30.0 path=/home/ldapusers/janneke.aylward/anaconda3/envs /funannotate/bin/bedtools [05/22/23 10:18:42]: Found existing output directory Spp_funannotate. Warning, will re-use any intermediate files found. [05/22/23 10:18:42]: Parsing input files [05/22/23 10:18:42]: Existing tbl found: Can_funannotate/predict_results/Spp_CMW58288.tbl [05/22/23 10:18:51]: TBL file: Can_funannotate/annotate_misc/genome.tbl [05/22/23 10:18:51]: GFF3 file: Can_funannotate/predict_results/Spp_CMWXXXXX.gff3 [05/22/23 10:18:51]: Proteins file: Can_funannotate/annotate_misc/genome.proteins.fasta [05/22/23 10:19:06]: Adding Functional Annotation to Spp, NCBI accession: None [05/22/23 10:19:06]: Annotation consists of: 13,459 gene models [05/22/23 10:19:06]: 13,299 protein records loaded [05/22/23 10:19:06]: Existing Pfam-A results found: Can_funannotate/annotate_misc/annotations.pfam.txt [05/22/23 10:19:06]: 14,338 annotations added [05/22/23 10:19:06]: Running Diamond blastp search of UniProt DB version 2022_02 [05/22/23 10:19:09]: 862 valid gene/product annotations from 1,271 total [05/22/23 10:19:10]: Existing Eggnog-mapper results found: Can_funannotate/annotate_misc/eggnog.emapper.annotations [05/22/23 10:19:10]: Parsing EggNog Annotations [05/22/23 10:19:10]: EggNog version parsed as 2.1.6 [05/22/23 10:19:10]: EggNog annotation detected as emapper v2.1.6 and DB prefix ENOG50 [05/22/23 10:19:10]: 25,178 COG and EggNog annotations added [05/22/23 10:19:10]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.80 [05/22/23 10:19:11]: 2,768 gene name and product description annotations added [05/22/23 10:19:11]: Existing MEROPS results found: Can_funannotate/annotate_misc/annotations.merops.txt [05/22/23 10:19:11]: 479 annotations added [05/22/23 10:19:11]: Existing CAZYme results found: Can_funannotate/annotate_misc/annotations.dbCAN.txt [05/22/23 10:19:11]: 741 annotations added [05/22/23 10:19:11]: Existing BUSCO2 results found: Can_funannotate/annotate_misc/annotations.busco.txt [05/22/23 10:19:11]: 1,247 annotations added [05/22/23 10:19:11]: Skipping phobius predictions, try funannotate remote -m phobius [05/22/23 10:19:11]: Existing SignalP results found: Can_funannotate/annotate_misc/signalp.results.txt [05/22/23 10:19:12]: 1,275 secretome and 0 transmembane annotations added [05/22/23 10:19:16]: Now parsing antiSMASH v6 results, finding SM clusters [05/22/23 10:19:20]: Found 84 clusters, 208 biosynthetic enyzmes, and 282 smCOGs predicted by antiSMASH [05/22/23 10:19:29]: Found 0 duplicated annotations, adding 100,150 valid annotations [05/22/23 10:19:31]: Parsing tbl file: /home/ldapusers/janneke.aylward/Genus/Spp/Can_funannota te/annotate_misc/genome.tbl [05/22/23 10:19:31]: Converting to final Genbank format, good luck! [05/22/23 10:19:31]: /home/ldapusers/janneke.aylward/anaconda3/envs/funannotate/bin/python /home/ldapusers /janneke.aylward/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/tbl2asn_pa rallel.py -i Can_funannotate/annotate_misc/tbl2asn/genome.tbl -f Can_funannotate/annotate_misc/tbl2asn/g enome.fsa -o Can_funannotate/annotate_misc/tbl2asn --sbt /home/ldapusers/janneke.aylward/Genus/sbt/Spp.sbt -d discrepency.report.txt -s Spp -t -l paired-ends -v 1 -c 5 [05/22/23 10:22:28]: ERROR: GBK file conversion failed, tbl2asn parallel script has died

[EDIT: I tried this in Docker as well and I’m getting the exact same issue. Funannotate predict runs fine. tbl2asn fails at the end of the annotate step]

May 22 '23 08:05 jaylward2

funannotate funannotate copied to clipboard

ERROR: GBK file conversion failed, tbl2asn parallel script has died

funannotate
funannotate copied to clipboard