taxizedb
taxizedb copied to clipboard
Tools for Working with Taxonomic SQL Databases
taxizedb
taxizedb
- Tools for Working with Taxonomic Databases on your machine
Docs: https://docs.ropensci.org/taxizedb/
taxize is a heavily used taxonomic toolbelt package in R - However, it makes web requests for nearly all methods. That is fine for most cases, but when the user has many, many names it is much more efficient to do requests to a local SQL database.
Data sources
Not all taxonomic databases are publicly available, or possible to mash into a SQLized version. Taxonomic DB’s supported:
- NCBI: text files are provided by NCBI, which we stitch into a sqlite db
- ITIS: they provide a sqlite dump, which we use here
- The PlantList: created from stitching together csv files. this source is no longer updated as far as we can tell. they say they’ve moved focus to the World Flora Online
- Catalogue of Life: created from Darwin Core Archive dump.
- GBIF: created from Darwin Core Archive dump. right now we only have the taxonomy table (called gbif), but will add the other tables in the darwin core archive later
- Wikidata: aggregated taxonomy of Open Tree of Life, GLoBI and Wikidata. On Zenodo, created by Joritt Poelen of GLOBI.
- World Flora Online: http://www.worldfloraonline.org/
Update schedule for databases:
- NCBI: since
db_download_ncbi
creates the database when the function is called, it’s updated whenever you run the function - ITIS: since ITIS provides the sqlite database as a download, you can
delete the old file and run
db_download_itis
to get a new dump; they I think update the dumps every month or so - The PlantList: no longer updated, so you shouldn’t need to download this after the first download. hosted on Amazon S3
- Catalogue of Life: a GitHub Actions job runs once a day at 00:00 UTC, building the lastest COL data into a SQLite database thats hosted on Amazon S3
- GBIF: a GitHub Actions job runs once a day at 00:00 UTC, building the lastest GBIF data into a SQLite database thats hosted on Amazon S3
- Wikidata: last updated April 6, 2018. Scripts are available to update the data if you prefer to do it yourself.
- World Flora Online: since
db_download_wfo
creates the database when the function is called, it’s updated whenever you run the function
Links:
- NCBI: ftp://ftp.ncbi.nih.gov/pub/taxonomy/
- ITIS: https://www.itis.gov/downloads/index.html
- The PlantList - http://www.theplantlist.org/
- Catalogue of Life:
- latest monthly edition via https://www.catalogueoflife.org/data/download
- GBIF: http://rs.gbif.org/datasets/backbone/
- Wikidata: https://zenodo.org/record/1213477
- World Flora Online: http://www.worldfloraonline.org/
Get in touch in the issues with any ideas on new data sources.
All databases are SQLite.
Package API
This package for each data sources performs the following tasks:
- Downloaded taxonomic databases
db_download_*
- Create
dplyr
SQL backend viadbplyr::src_dbi
-src_*
- Query and get data back into a data.frame -
sql_collect
- Manage cached database files -
tdb_cache
- Retrieve immediate descendents of a taxon -
children
- Retrieve the taxonomic hierarchies from local database -
classification
- Retrieve all taxa descending from a vector of taxa -
downstream
- Convert species names to taxon IDs -
name2taxid
- Convert taxon IDs to species names -
taxid2name
- Convert taxon IDs to ranks -
taxid2rank
You can use the src
connections with dplyr
, etc. to do operations
downstream. Or use the database connection to do raw SQL queries.
install
cran version
install.packages("taxizedb")
dev version
remotes::install_github("ropensci/taxizedb")
Citation
To cite taxizedb in publications use:
- Chamberlain S, Arendsee Z, Stirling T (2023). taxizedb: Tools for Working with ‘Taxonomic’ Databases. R package version 0.3.1. https://doi.org/10.5281/zenodo.1158055
Meta
- Please report any issues, bugs or feature requests.
- License: MIT
- Get citation information for
taxizedb
in R withcitation(package = 'taxizedb')
- Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.