abricate icon indicating copy to clipboard operation
abricate copied to clipboard

ARO accession numbers missing from CARD database

Open tiagofilipe12 opened this issue 6 years ago • 1 comments

While using abricate to get resistance genes out of a set of sequences, I have noticed that card entries do not have aro accessions in the report file, instead have ncbi accessions (which is fine of course). However, aro accessions are also very useful when using card because they allow you to make direct links to CARD website. I have started digging your fasta files and found the one for card. However, each entry is like this one:

>card~~~AAC(1)~~~HM036080:132-597 Acetylation of paromomycin, and apramycin, on the amino group at position 1 in E. coli, Actinomycete, Campylobacter spp.

So, there is no aro accession at all. Therefore for my specific problem I just made a dictionary that matched ncbi accession numbers with the respective aro accession using cards file (available here). I used the aro_index.csv, but original card fastas already have a header like this:

>gb|GQ343019|+|132-1023|ARO:3002999|CblA-1 [mixed culture bacterium AX_gF3SD01_15]

Perhaps you can find a way to maintain these "links" on your card fasta database.

tiagofilipe12 avatar Nov 07 '17 09:11 tiagofilipe12

Yes, you are right that I do not keep the ARO numbers from the CARD input file.

I think I will add a new column called DATABASE_ACCESSION for this purpose.

tseemann avatar Mar 18 '18 00:03 tseemann