Kleborate icon indicating copy to clipboard operation
Kleborate copied to clipboard

error occurring for some FASTA files

Open karubiotools opened this issue 3 years ago • 2 comments

Dear Developers, I would like to know why the following error occur when I launch Kleborate with the '--all' option for some FASTA files, please: " strain species ST virulence_score resistance_score Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST RmpADC RmST rmpA2 wzi K_locus K_locus_confidence O_locus O_locus_confidence AGly_acquired Col_acquired Fcyn_acquired Flq_acquired Gly_acquired MLS_acquired Phe_acquired Rif_acquired Sul_acquired Tet_acquired Tgc_acquired Tmt_acquired Bla_acquired Bla_inhR_acquired Bla_ESBL_acquired Bla_ESBL_inhR_acquired Bla_Carb_acquired Bla_chr SHV_mutations Omp_mutations Col_mutations Flq_mutations truncated_resistance_hits spurious_resistance_hits Traceback (most recent call last): File "/soft/miniconda3/bin/kleborate", line 33, in sys.exit(load_entry_point('Kleborate==2.2.0', 'console_scripts', 'kleborate')()) File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/main.py", line 64, in main results.update(get_resistance_results(data_folder, contigs, args, res_headers, File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/main.py", line 570, in get_resistance_results res_hits = resblast_one_assembly(contigs, gene_info, qrdr, trunc, omp, seqs, File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/resBLAST.py", line 32, in resblast_one_assembly hits_dict = blast_against_all(seqs, min_cov, min_ident, contigs, gene_info, File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/resBLAST.py", line 125, in blast_against_all hit_allele, hit_class, hit_bla_class = gene_info[hit.gene_id] KeyError: '403__TetX_Tet__tet(X6)__2434' " Thank you in advance for your help. Best regards, David

karubiotools avatar Jun 20 '22 15:06 karubiotools

That's because your "gene_info" did not match with "gene_id" in your *.csv file. Take a look back to "gene_id" in your database file, I think you need to fix that ID a little bit before running it again. Hope its help.

nquynh8991 avatar Oct 24 '22 09:10 nquynh8991

This exact error just occurred for me on one sequence out of a set of >800. Thanks to @nquynh8991's comment I tracked it down to a typo for two sequences in the CARD database that comes with the latest version of kleborate here.

CARD_v3.0.8.fasta contains the headers:

402__TetX_Tet__tet(X5)__2433 403__TetX_Tet__tet(X6)__2434

CARD_AMR_clustered.csv contains the entries:

402,tet(X5),Tgc,TetX,tet(X5),2433,ARO_3005057,-,-,no,no,NA,NA 403,tet(X6),Tgc,TetX,tet(X6),2434,ARO_3005056,-,-,no,no,NA,NA

The difference is the specified antibiotic (Tet vs Tgc). I believe these should be Tgc in the fasta file headers, consistent with the csv file (these variants of TetX are associated with tigecycline resistance)

I solved this by editing the CARD fasta file and recreating the blast database from it. It would be good to solve this typo for a future Kleborate or database release!

jenny-draper avatar Jan 17 '23 00:01 jenny-draper