taxizedb
taxizedb copied to clipboard
`name2taxid` includes some questionable material
This happens:
> name2taxid("s2")
"164330"
> name2taxid("s2") %>% taxid2name
"Thauera aminoaromatica"
S2 is a strain name for this bacteria. See here.
The problem is that I allow matches against any name_class
. Here are all the name classes in the database:
name_class | count(name_class) |
---|---|
acronym | 1167 |
anamorph | 302 |
authority | 410075 |
blast name | 229 |
common name | 14204 |
equivalent name | 25058 |
genbank acronym | 486 |
genbank anamorph | 107 |
genbank common name | 28182 |
genbank synonym | 2958 |
in-part | 628 |
includes | 36595 |
misnomer | 1386 |
misspelling | 35975 |
scientific name | 1689025 |
synonym | 168033 |
teleomorph | 179 |
type material | 11449 |
So the question is, which of these should we include?
Most of them seem pretty reasonable. The problematic ones are type material
and acronym
. Perhaps we should allow the user to select which name classes to allow?