bakta
bakta copied to clipboard
Bakta neither generated circular genome plots nor error messages for running circos command
Describe the bug
Bakta (Version: 1.9.2) neither generated circular genome annotation plots nor error messages for circos command. I manually executed the following circos command and encountered some error messages. The main issue is that bakta can not detect the error for running circos command, and loading perl modules error.
$ circos -conf /tmp/tmp5xvqgawv/circos/main.conf
*** REQUIRED MODULE(S) MISSING OR OUT-OF-DATE ***
You are missing one or more Perl modules, require newer versions, or some modules failed to load. Use CPAN to install it as described in this tutorial
http://www.circos.ca/documentation/tutorials/configuration/perl_and_modules
missing GD
error Can't locate GD.pm in @INC (you may need to install the GD module) (@INC contains: /home/xjm/mambaforge-pypy3/envs/TG_assembly/bin/lib /home/xjm/mambaforge-pypy3/envs/TG_assembly/bin/../lib /home/xjm/mambaforge-pypy3/envs/TG_assembly/bin /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/5.32/site_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/site_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/5.32/vendor_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/vendor_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/5.32/core_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/core_perl .) at (eval 38) line 1.
missing GD::Polyline
error Can't locate GD/Polyline.pm in @INC (you may need to install the GD::Polyline module) (@INC contains: /home/xjm/mambaforge-pypy3/envs/TG_assembly/bin/lib /home/xjm/mambaforge-pypy3/envs/TG_assembly/bin/../lib /home/xjm/mambaforge-pypy3/envs/TG_assembly/bin /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/5.32/site_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/site_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/5.32/vendor_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/vendor_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/5.32/core_perl /home/xjm/mambaforge-pypy3/envs/TG_assembly/lib/perl5/core_perl .) at (eval 39) line 1.
I create my environment using the following command:
mamba env create -n TG_assembly --file env.yaml
My env.yaml file:
channels:
- conda-forge
- bioconda
dependencies:
- fastp
- flye
- minimap2
- samtools
- bakta
- prodigal
- biopython
- pandas
Therefore, please provide us with at least the following information:
- Which commandos have been executed and what exactly happened
bakta -f --db ~/db/bakta/db --verbose --output bakta_output/V17 --prefix V17 --locus-tag V17 --replicons bakta_output/V17/bakta_replicons.tsv --threads 127 TG_genome/V17.fasta
- Detailed logs (execute Bakta with
--debug)
14:01:20.422 - INFO - UTILS - coding-ratio=0.918
14:01:20.427 - INFO - TSV - write tsv: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.tsv
14:01:20.436 - INFO - GFF - write GFF3: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.gff3
14:01:20.502 - DEBUG - INSDC - prepare: genbank=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.gbff, embl=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.embl
14:01:20.593 - INFO - INSDC - write GenBank: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.gbff
14:01:21.112 - INFO - INSDC - write EMBL: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.embl
14:01:21.780 - INFO - FASTA - write genome sequences: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.fna, description=True, wrap=True
14:01:21.804 - INFO - FASTA - write feature nucleotide sequences: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.ffn
14:01:21.815 - INFO - FASTA - write translated CDS/sORF: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.faa
14:01:22.615 - DEBUG - PLOT - write gc config: seq-length=5973571, step-size=1659, window-size=3318, max-gc=0, max-gc-skew=0
14:01:22.617 - DEBUG - PLOT - write main config: file-name=V17, output-dir=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17, multiplier=0.001, label-prefixkbp
14:01:22.617 - INFO - PLOT - write circular genome plot: file-name=V17, output-dir=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17
14:01:22.617 - DEBUG - PLOT - cmd=['circos', '-conf', PosixPath('/tmp/tmp5xvqgawv/circos/main.conf')]
14:01:22.802 - INFO - TSV - write hypothetical tsv: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.hypotheticals.tsv
14:01:22.808 - INFO - FASTA - write translated CDS/sORF: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.hypotheticals.faa
14:01:22.810 - INFO - JSON - write JSON: path=/home/xjm/cyr/Verru17_pacbio/bakta_output/V17/V17.json
- How did you install Bakta:
Conda,PipConda
Hi and thanks for reporting. This is odd. Bakta checks the return code when executing external processes if they are != 0 and for some reason Circos/Perl seems not to exit with an exit code different from 0.
Obviously, this is not a bug within Bakta but rather seems to be an issue with Mamba/Conda and solving the dependencies of Circos. Hence, could you try to explicitly add Circos to your env.yaml file? Maybe this helps.
I have another question. The locus_tag numbers generated by bakta have a step range of 5. Is there any parameter to configure it to generate consecutive numbers like: X_0001, X_0002, X_0003....
No currently there is no such parameter. We configured the locus tag step size to a default of 5 to leave some space for potential refinements and additions after the Bakta annotation.
Is there a specific reason or usecase for your question? If there is a need to configure the locus tag step size, I'd consider adding such a parameter.
In certain scenarios, like gene cluster scanning and functional gene analysis, having consecutive numbers will provide clear indication that these genes are positioned consecutively in the genome. It is easy to calculate the approximate position distance between any tow locus tags in consecutive order. Additionally, the majority of other tools (pgap, prokka) typically use consecutive numbers as the default, which led to my initial confusion when I first browsed the bakta result and mistakenly thought that there were multiple non-coding genes between two gene locus tags with a 5-step difference. I hope such a parameter could be available in the future versions.
@1996xjm , I just opened a dedicated issue for that: #279, so we can discuss this in a particular space. With that, I'd close this one.