bakta Exception: diamond error! error code: -6

Exception: diamond error! error code: -6

Open Ahmed-Shibl opened this issue 1 year ago • 10 comments

Hi @oschwengers , thanks for this tool! excited to use it on my MAGs dataset. Bakta v1.5.0 was installed with conda and I ran into a similar issue as #105

Here's the command I used to test on one of the MAGs: bakta --db /In_house/Metagenomic/oral_microbiome/bakta/db --verbose --output /400MGS/data/analysis/CH_NA0009561695_S222/bakta/ --prefix CH_NA0009561695_S222 --threads 16 /CH_NA0009561695_S222/reassemble/reassembled_bins/bin.2.orig.fa

It was running perfectly fine up to this error:

predict & annotate CDSs...
	predicted: 2281 
	discarded spurious: 0
	revised translational exceptions: 0
	detected IPSs: 679
	found PSCs: 1387
	found PSCCs: 95
	lookup annotations...
	conduct expert systems...
		amrfinder: 3
		protein sequences: 0
	combine annotations and mark hypotheticals...
	detect pseudogenes...
		pseudogene candidates: 38
Traceback (most recent call last):
  File "/scratch/gencore/.eb/2.0/software/bakta/1.5.0/bin/bakta", line 10, in <module>
    sys.exit(main())
  File "/scratch/gencore/.eb/2.0/software/bakta/1.5.0/lib/python3.10/site-packages/bakta/main.py", line 286, in main
    pseudogenes = feat_cds.detect_pseudogenes(pseudo_candidates, cdss, genome) if len(pseudo_candidates) > 0 else []
  File "/scratch/gencore/.eb/2.0/software/bakta/1.5.0/lib/python3.10/site-packages/bakta/features/cds.py", line 616, in detect_pseudogenes
    raise Exception(f'diamond error! error code: {proc.returncode}\n{proc.stdout}')
Exception: diamond error! error code: -6

The below lines are from the log file:

#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /tmpdata/tmpvjye3g74
#Target sequences to report alignments for: 25
Opening the database...  [0s]
Database: /tmpdata/tmpvjye3g74/cds.pseudo.references.dmnd (type: Diamond database, sequences: 38, letters: 15032)
Block size = 3000000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Loading query sequences...  [0s]
Masking queries...  [0.007s]
Algorithm: Double-indexed
Building query histograms...  [0.005s]
Allocating buffers...  [0.003s]
Loading reference sequences...  [0s]
Masking reference...  [0.003s]
Initializing temporary storage...  [0.001s]
Building reference histograms...  [0.003s]
Allocating buffers...  [0.003s]
Processing query block 1, reference block 1/1, shape 1/64.
Building reference seed array...  [0.001s]
Building query seed array...  [0.001s]
Computing hash join...  [0s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.005s]
Processing query block 1, reference block 1/1, shape 2/64.
Building reference seed array...  [0s]
Building query seed array...  [0.001s]
Computing hash join...  [0s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.001s]
Processing query block 1, reference block 1/1, shape 3/64.
Building reference seed array...  [0s]
Building query seed array...  [0.001s]
Computing hash join... terminate called without an active exception
'
23:59:09.882 - WARNING - CDS - PSEUDO failed! diamond-error-code=-6
23:59:10.104 - INFO - MAIN - removed tmp dir: /tmpdata/tmpvjye3g74

I've also tried with --threads 24 and got the same error.

Thanks for your help with this!

Sep 19 '22 20:09 Ahmed-Shibl

Thanks @Ahmed-Shibl for reaching out and reporting! We also recognized this issue only yesterday which might be related to and fixed by #131. I've just released patch version v1.5.1. Could you please update to v1.5.1 and check if this issue remains?

Thanks a lot and best regards!

Sep 20 '22 13:09 oschwengers

Hi, I've also recently run into a similar error on v1.5.1.

Last few lines of the log are:

10:24:13.110 - INFO - ANNOTATION - fix product: replace domain name underscores. new=Transket-pyr domain-containing protein, old=Transket_pyr domain-containing protein
10:24:13.110 - INFO - ANNOTATION - fix product: replace putative synonyms. new=putative oxaloacetate decarboxylase gamma chain, old=Probable oxaloacetate decarboxylase gamma chain
10:24:13.111 - INFO - ORF - write internal aa seqs: # seqs=278, path=/home/dfornika/tmp/auto-cpo-work/work-IDENTIFIER_REDACTED_routine-assembly_20221001094039/13/e00e655c5236a2abe9a5d9f97fff4d/tmp/tmpxsqa4yeo/cds.pseudo.candidates.faa
10:25:50.572 - WARNING - CDS - Diamond failed! diamond-error-code=-6

...and this is the error message on stderr:

Traceback (most recent call last):
  File "/home/dfornika/.conda/envs/bakta-ac53990dc7e86025e9d770b67eb80396/bin/bakta", line 10, in <module>
    sys.exit(main())
  File "/home/dfornika/.conda/envs/bakta-ac53990dc7e86025e9d770b67eb80396/lib/python3.10/site-packages/bakta/main.py", line 289, in main
    pseudo_candidates = feat_cds.predict_pseudo_candidates(hypotheticals)
  File "/home/dfornika/.conda/envs/bakta-ac53990dc7e86025e9d770b67eb80396/lib/python3.10/site-packages/bakta/features/cds.py", line 519, in predict_pseudo_candidates
    raise Exception(f'diamond error! error code: {proc.returncode}')
Exception: diamond error! error code: -6

Oct 02 '22 00:10 dfornika

Thanks @dfornika for reporting! Is this reproducible for a given genome? And how much RAM is available?

Oct 04 '22 12:10 oschwengers

@dfornika Could you please run Bakta setting the --debug flag and provide the *.log file? This will further help to pinpoint the issue.

Oct 04 '22 13:10 oschwengers

I'll collect that info and follow up as soon as I can.

Oct 04 '22 13:10 dfornika

It looked like it was a RAM issue for me because when I ran it again with higher memory it worked just fine. Thanks!

Oct 04 '22 13:10 Ahmed-Shibl

Thanks for the update @Ahmed-Shibl. Now, I'm curious if it's the same story for @dfornika. We often see Diamond issues with varying error codes, mainly 6 and 9. I think it would be very interesting and helpful to ask @bbuchfink if we could get a list of all Diamond error codes with their descriptions.

Oct 04 '22 14:10 oschwengers

I may be a bit slow to reproduce this and pass along more detailed info. But I can say that I was running bakta in the context of this pipeline: https://github.com/BCCDC-PHL/routine-assembly, on a new SLURM cluster where I believe each job gets 8 GB memory by default. So it's entirely possible that my issues are memory-related too. I may try throttling the memory down a bit to see if that triggers the error more consistently.

Oct 04 '22 19:10 dfornika

Thanks for the update @Ahmed-Shibl. Now, I'm curious if it's the same story for @dfornika. We often see Diamond issues with varying error codes, mainly 6 and 9. I think it would be very interesting and helpful to ask @bbuchfink if we could get a list of all Diamond error codes with their descriptions.

The only exit codes I use in Diamond is 0 for success and 1 for failure. Other exit codes are generated by the OS, e.g. 9 should mean killed for running out of memory.

Oct 06 '22 12:10 bbuchfink

Thank you so much @bbuchfink for the very quick reply! This info is very helpful for us to trace the cause of this and future issues.

Oct 07 '22 07:10 oschwengers

@dfornika, sorry for the late reply. 8 Gb is definitely not enough for Diamond in the Bakta setup. For a high-throughput SLURM setup, I would recommend 8 threads and 16 Gb memory - from our SLURM experience this works very well for most genomes. Occasionally, we see executions that require even more, but these are very rare cases.

If an optimization of overall throughput is desired, one could reduce the cores from 8 to 4 since some internal steps do not scale very well. Of course, in turn this results in longer runtimes per genome and doesn't change the memory requirements.

Since I'm pretty sure that the initial issue is caused by these memory restrictions, I'll close this for now. However, if these issues remain even though using more memory, please do not hesitate to re-open it. Thanks again and best regards!

Oct 18 '22 08:10 oschwengers

bakta bakta copied to clipboard

Exception: diamond error! error code: -6

bakta
bakta copied to clipboard