bakta
bakta copied to clipboard
Exception: diamond error! error code: -6
Hi @oschwengers , thanks for this tool! excited to use it on my MAGs dataset. Bakta v1.5.0 was installed with conda and I ran into a similar issue as #105
Here's the command I used to test on one of the MAGs:
bakta --db /In_house/Metagenomic/oral_microbiome/bakta/db --verbose --output /400MGS/data/analysis/CH_NA0009561695_S222/bakta/ --prefix CH_NA0009561695_S222 --threads 16 /CH_NA0009561695_S222/reassemble/reassembled_bins/bin.2.orig.fa
It was running perfectly fine up to this error:
predict & annotate CDSs...
predicted: 2281
discarded spurious: 0
revised translational exceptions: 0
detected IPSs: 679
found PSCs: 1387
found PSCCs: 95
lookup annotations...
conduct expert systems...
amrfinder: 3
protein sequences: 0
combine annotations and mark hypotheticals...
detect pseudogenes...
pseudogene candidates: 38
Traceback (most recent call last):
File "/scratch/gencore/.eb/2.0/software/bakta/1.5.0/bin/bakta", line 10, in <module>
sys.exit(main())
File "/scratch/gencore/.eb/2.0/software/bakta/1.5.0/lib/python3.10/site-packages/bakta/main.py", line 286, in main
pseudogenes = feat_cds.detect_pseudogenes(pseudo_candidates, cdss, genome) if len(pseudo_candidates) > 0 else []
File "/scratch/gencore/.eb/2.0/software/bakta/1.5.0/lib/python3.10/site-packages/bakta/features/cds.py", line 616, in detect_pseudogenes
raise Exception(f'diamond error! error code: {proc.returncode}\n{proc.stdout}')
Exception: diamond error! error code: -6
The below lines are from the log file:
#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /tmpdata/tmpvjye3g74
#Target sequences to report alignments for: 25
Opening the database... [0s]
Database: /tmpdata/tmpvjye3g74/cds.pseudo.references.dmnd (type: Diamond database, sequences: 38, letters: 15032)
Block size = 3000000000
Opening the input file... [0s]
Opening the output file... [0s]
Loading query sequences... [0s]
Masking queries... [0.007s]
Algorithm: Double-indexed
Building query histograms... [0.005s]
Allocating buffers... [0.003s]
Loading reference sequences... [0s]
Masking reference... [0.003s]
Initializing temporary storage... [0.001s]
Building reference histograms... [0.003s]
Allocating buffers... [0.003s]
Processing query block 1, reference block 1/1, shape 1/64.
Building reference seed array... [0.001s]
Building query seed array... [0.001s]
Computing hash join... [0s]
Masking low complexity seeds... [0s]
Searching alignments... [0.005s]
Processing query block 1, reference block 1/1, shape 2/64.
Building reference seed array... [0s]
Building query seed array... [0.001s]
Computing hash join... [0s]
Masking low complexity seeds... [0s]
Searching alignments... [0.001s]
Processing query block 1, reference block 1/1, shape 3/64.
Building reference seed array... [0s]
Building query seed array... [0.001s]
Computing hash join... terminate called without an active exception
'
23:59:09.882 - WARNING - CDS - PSEUDO failed! diamond-error-code=-6
23:59:10.104 - INFO - MAIN - removed tmp dir: /tmpdata/tmpvjye3g74
I've also tried with --threads 24
and got the same error.
Thanks for your help with this!
Thanks @Ahmed-Shibl for reaching out and reporting!
We also recognized this issue only yesterday which might be related to and fixed by #131.
I've just released patch version v1.5.1. Could you please update to v1.5.1
and check if this issue remains?
Thanks a lot and best regards!
Hi, I've also recently run into a similar error on v1.5.1.
Last few lines of the log are:
10:24:13.110 - INFO - ANNOTATION - fix product: replace domain name underscores. new=Transket-pyr domain-containing protein, old=Transket_pyr domain-containing protein
10:24:13.110 - INFO - ANNOTATION - fix product: replace putative synonyms. new=putative oxaloacetate decarboxylase gamma chain, old=Probable oxaloacetate decarboxylase gamma chain
10:24:13.111 - INFO - ORF - write internal aa seqs: # seqs=278, path=/home/dfornika/tmp/auto-cpo-work/work-IDENTIFIER_REDACTED_routine-assembly_20221001094039/13/e00e655c5236a2abe9a5d9f97fff4d/tmp/tmpxsqa4yeo/cds.pseudo.candidates.faa
10:25:50.572 - WARNING - CDS - Diamond failed! diamond-error-code=-6
...and this is the error message on stderr:
Traceback (most recent call last):
File "/home/dfornika/.conda/envs/bakta-ac53990dc7e86025e9d770b67eb80396/bin/bakta", line 10, in <module>
sys.exit(main())
File "/home/dfornika/.conda/envs/bakta-ac53990dc7e86025e9d770b67eb80396/lib/python3.10/site-packages/bakta/main.py", line 289, in main
pseudo_candidates = feat_cds.predict_pseudo_candidates(hypotheticals)
File "/home/dfornika/.conda/envs/bakta-ac53990dc7e86025e9d770b67eb80396/lib/python3.10/site-packages/bakta/features/cds.py", line 519, in predict_pseudo_candidates
raise Exception(f'diamond error! error code: {proc.returncode}')
Exception: diamond error! error code: -6
Thanks @dfornika for reporting! Is this reproducible for a given genome? And how much RAM is available?
@dfornika Could you please run Bakta setting the --debug
flag and provide the *.log
file? This will further help to pinpoint the issue.
I'll collect that info and follow up as soon as I can.
It looked like it was a RAM issue for me because when I ran it again with higher memory it worked just fine. Thanks!
Thanks for the update @Ahmed-Shibl. Now, I'm curious if it's the same story for @dfornika. We often see Diamond issues with varying error codes, mainly 6
and 9
. I think it would be very interesting and helpful to ask @bbuchfink if we could get a list of all Diamond error codes with their descriptions.
I may be a bit slow to reproduce this and pass along more detailed info. But I can say that I was running bakta in the context of this pipeline: https://github.com/BCCDC-PHL/routine-assembly, on a new SLURM cluster where I believe each job gets 8 GB memory by default. So it's entirely possible that my issues are memory-related too. I may try throttling the memory down a bit to see if that triggers the error more consistently.
Thanks for the update @Ahmed-Shibl. Now, I'm curious if it's the same story for @dfornika. We often see Diamond issues with varying error codes, mainly
6
and9
. I think it would be very interesting and helpful to ask @bbuchfink if we could get a list of all Diamond error codes with their descriptions.
The only exit codes I use in Diamond is 0 for success and 1 for failure. Other exit codes are generated by the OS, e.g. 9 should mean killed for running out of memory.
Thank you so much @bbuchfink for the very quick reply! This info is very helpful for us to trace the cause of this and future issues.
@dfornika, sorry for the late reply. 8 Gb is definitely not enough for Diamond in the Bakta setup. For a high-throughput SLURM setup, I would recommend 8 threads and 16 Gb memory - from our SLURM experience this works very well for most genomes. Occasionally, we see executions that require even more, but these are very rare cases.
If an optimization of overall throughput is desired, one could reduce the cores from 8 to 4 since some internal steps do not scale very well. Of course, in turn this results in longer runtimes per genome and doesn't change the memory requirements.
Since I'm pretty sure that the initial issue is caused by these memory restrictions, I'll close this for now. However, if these issues remain even though using more memory, please do not hesitate to re-open it. Thanks again and best regards!