dataspice
dataspice copied to clipboard
taxonomy notes
i added some things to taxizedb
: install like remotes::install_github("ropensci/taxizedb@new-methods")
-
lowest_common
- equiv of same fxn name intaxize
, only works with ncbi only for now -
taxid2vernacular
- get vernacular names from taxonomic ids, works for ncbi only for now
cc @magpiedin @cboettig
Thanks, @skott! I think these will be handy when we set up some simple functions to let users opt in to linking id's...issue forthcoming..
Some initial thoughts on pulling together taxonomic info:
remotes::install_github("ropenscilabs/taxadc")
library(taxadc)
tdc_tax_strings(x = ex_taxonomy)
#> # A tibble: 1 x 3
#> taxonID scientificName taxonRank
#> <chr> <chr> <chr>
#> 1 9681|4479|9681|4544|146712|146712|93036|9696|9696 Mammalia|Notoryctidae|Felidae|Notorycte… class|family|family|gen…
thoughts?
Sweet! (& sorry for molasses-paced follow-up!)
Those look like good pieces to include in a few spots! What goes into setting up the ex_taxonomy
object?
...I'm thinking something along these lines might help set that up, and fit with what we had in mind:
- Add
prep_biblio.R
(similar toprep_attributes()
in prep.R) to get a unique list of taxa in a dataset & add them to the 'keyword' field in biblio.csv:
prep_biblio() <- function () { # set-up & file-checking similar to prep_attributes()
biblio <- readr::read_csv(biblio_path, col_types = readr::cols())
x <- # input dataset...
taxonCoverage <- unique(x[grepl("species|tax", names(x), ignore.case = T)])
# could/should distinguish between verbal-names & numeric-IDs here
biblio$keywords[1] <- paste(biblio$keywords[1],
taxonCoverage,
collapse = ", ")
# could add conditional logic if there are more than 5(?) taxa...
# e.g., use lowest_common(taxonCoverage, db = "col", rows = 1)$name
# ...or does that invite problems/errors/fun?
message("To link scientific names to taxonIDs, try taxize::gnr_resolve")
}
readr::write_csv(biblio, path = biblio_path)
message("The following taxonCoverage has been added to 'keywords' in the biblio file: ",
"\n \n", paste(unlist(taxonCoverage), collapse = ", "))
...If that's worth a try/doesn't deviate wildly from what we discussed, I'll add it in. (just being cautious, in awe of others' git/devtools/travis-fu skills)
And appending taxa to the biblio.csv 'keyword' field for now seemed worthwhile to me unless there's a more appropriate way -- e.g., for schema.org/JSON-LD to reference the 'taxonCoverage' field in EML/Audubon Core/other standards.
We had also talked about setting up a separate taxa.csv template to accommodate kingdom/phylum/etc for each [potentially multiple] taxon id'ed in a dataset. Does that still sounds right? @cboettig & all (if so, just a heads-up in case adding a 5th template would throw off any functions in their current state)
@magpiedin the ex_taxonomy
object is an example taxmap
object in the taxa
package