Retrieval more taxonomics IDs than the one present in the "prot.accession2taxid.FULL"
Hello,
I built Diamond database using;
diamond makedb --in nr.gz \ –db nr_diamond –taxonmap prot.accession2taxid.FULL –taxonnodes nodes.dmp –taxonnames names.dmp –threads 72
and run diamond blastp to get a tabular file with subject sequence id and matching taxonomic IDs. When I inspected some of the results, even there is only one matching taxonomic ID for a protein (for ex, tax ID for 'WP_119979703.1' is '2292949') in "prot.accession2taxid.FULL" and on NCBI website, I got more than one entries for some ("29523" and "2292949" for 'WP_119979703.1').
When I try to use MEGAN, the LCA algorithm may cause to retrieve root for most of such entries, and loosing the taxon resolution. I cannot perform manual search in"prot.accession2taxid.FULL", because it will take ages. Can you help me to understand the issue?
Best regards.
If you look up this protein with NCBI, you can see under identical proteins that there's an entry (MBO4974725.1) with taxon id 29523. These entries are merged if you use the NR database. I don't have a good solution for this at the moment. Would an option to ignore all taxids above species rank help you?