dataspice icon indicating copy to clipboard operation
dataspice copied to clipboard

taxonomy notes

Open sckott opened this issue 6 years ago • 5 comments

i added some things to taxizedb: install like remotes::install_github("ropensci/taxizedb@new-methods")

  • lowest_common - equiv of same fxn name in taxize, only works with ncbi only for now
  • taxid2vernacular - get vernacular names from taxonomic ids, works for ncbi only for now

sckott avatar May 22 '18 17:05 sckott

cc @magpiedin @cboettig

sckott avatar May 22 '18 17:05 sckott

Thanks, @skott! I think these will be handy when we set up some simple functions to let users opt in to linking id's...issue forthcoming..

magpiedin avatar May 22 '18 18:05 magpiedin

Some initial thoughts on pulling together taxonomic info:

remotes::install_github("ropenscilabs/taxadc")
library(taxadc)
tdc_tax_strings(x = ex_taxonomy)
#> # A tibble: 1 x 3
#>   taxonID                                           scientificName                           taxonRank
#>   <chr>                                             <chr>                                    <chr>
#> 1 9681|4479|9681|4544|146712|146712|93036|9696|9696 Mammalia|Notoryctidae|Felidae|Notorycte… class|family|family|gen…

thoughts?

sckott avatar May 22 '18 21:05 sckott

Sweet! (& sorry for molasses-paced follow-up!)

Those look like good pieces to include in a few spots! What goes into setting up the ex_taxonomy object?

...I'm thinking something along these lines might help set that up, and fit with what we had in mind:

  • Add prep_biblio.R (similar to prep_attributes() in prep.R) to get a unique list of taxa in a dataset & add them to the 'keyword' field in biblio.csv:
prep_biblio() <- function () {  # set-up & file-checking similar to prep_attributes()
   
    biblio <- readr::read_csv(biblio_path, col_types = readr::cols())
    x <- # input dataset...

    taxonCoverage <- unique(x[grepl("species|tax", names(x), ignore.case = T)])
    # could/should distinguish between verbal-names & numeric-IDs here

    biblio$keywords[1] <- paste(biblio$keywords[1], 
                                taxonCoverage,
                                collapse = ", ")

    # could add conditional logic if there are more than 5(?) taxa...
    # e.g., use lowest_common(taxonCoverage, db = "col", rows = 1)$name 
    # ...or does that invite problems/errors/fun?

    message("To link scientific names to taxonIDs, try taxize::gnr_resolve")
}

readr::write_csv(biblio, path = biblio_path)

message("The following taxonCoverage has been added to 'keywords' in the biblio file: ",
          "\n \n", paste(unlist(taxonCoverage), collapse = ", "))

...If that's worth a try/doesn't deviate wildly from what we discussed, I'll add it in. (just being cautious, in awe of others' git/devtools/travis-fu skills)

And appending taxa to the biblio.csv 'keyword' field for now seemed worthwhile to me unless there's a more appropriate way -- e.g., for schema.org/JSON-LD to reference the 'taxonCoverage' field in EML/Audubon Core/other standards.

We had also talked about setting up a separate taxa.csv template to accommodate kingdom/phylum/etc for each [potentially multiple] taxon id'ed in a dataset. Does that still sounds right? @cboettig & all (if so, just a heads-up in case adding a 5th template would throw off any functions in their current state)

magpiedin avatar May 28 '18 17:05 magpiedin

@magpiedin the ex_taxonomy object is an example taxmap object in the taxa package

sckott avatar Jun 11 '18 21:06 sckott