error occurring for some FASTA files
Dear Developers,
I would like to know why the following error occur when I launch Kleborate with the '--all' option for some FASTA files, please:
"
strain species ST virulence_score resistance_score Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST RmpADC RmST rmpA2 wzi K_locus K_locus_confidence O_locus O_locus_confidence AGly_acquired Col_acquired Fcyn_acquired Flq_acquired Gly_acquired MLS_acquired Phe_acquired Rif_acquired Sul_acquired Tet_acquired Tgc_acquired Tmt_acquired Bla_acquired Bla_inhR_acquired Bla_ESBL_acquired Bla_ESBL_inhR_acquired Bla_Carb_acquired Bla_chr SHV_mutations Omp_mutations Col_mutations Flq_mutations truncated_resistance_hits spurious_resistance_hits
Traceback (most recent call last):
File "/soft/miniconda3/bin/kleborate", line 33, in
That's because your "gene_info" did not match with "gene_id" in your *.csv file. Take a look back to "gene_id" in your database file, I think you need to fix that ID a little bit before running it again. Hope its help.
This exact error just occurred for me on one sequence out of a set of >800. Thanks to @nquynh8991's comment I tracked it down to a typo for two sequences in the CARD database that comes with the latest version of kleborate here.
CARD_v3.0.8.fasta contains the headers:
402__TetX_Tet__tet(X5)__2433 403__TetX_Tet__tet(X6)__2434
CARD_AMR_clustered.csv contains the entries:
402,tet(X5),Tgc,TetX,tet(X5),2433,ARO_3005057,-,-,no,no,NA,NA 403,tet(X6),Tgc,TetX,tet(X6),2434,ARO_3005056,-,-,no,no,NA,NA
The difference is the specified antibiotic (Tet vs Tgc). I believe these should be Tgc in the fasta file headers, consistent with the csv file (these variants of TetX are associated with tigecycline resistance)
I solved this by editing the CARD fasta file and recreating the blast database from it. It would be good to solve this typo for a future Kleborate or database release!