taxizedb icon indicating copy to clipboard operation
taxizedb copied to clipboard

`name2taxid` includes some questionable material

Open arendsee opened this issue 6 years ago • 4 comments

This happens:

> name2taxid("s2")
"164330"
 > name2taxid("s2") %>% taxid2name
"Thauera aminoaromatica"

S2 is a strain name for this bacteria. See here.

The problem is that I allow matches against any name_class. Here are all the name classes in the database:

name_class count(name_class)
acronym 1167
anamorph 302
authority 410075
blast name 229
common name 14204
equivalent name 25058
genbank acronym 486
genbank anamorph 107
genbank common name 28182
genbank synonym 2958
in-part 628
includes 36595
misnomer 1386
misspelling 35975
scientific name 1689025
synonym 168033
teleomorph 179
type material 11449

So the question is, which of these should we include?

Most of them seem pretty reasonable. The problematic ones are type material and acronym. Perhaps we should allow the user to select which name classes to allow?

arendsee avatar Mar 17 '18 00:03 arendsee