fulltext
fulltext copied to clipboard
:warning: ARCHIVED :warning: Search across and get full text for OA & closed journals
fulltext
Get full text research articles
Checkout the package docs and the fulltext manual to get started.
rOpenSci has a number of R packages to get either full text, metadata, or both from various publishers. The goal of fulltext is to integrate these packages to create a single interface to many data sources.
fulltext makes it easy to do text-mining by supporting the following steps:
- Search for articles -
ft_search - Fetch articles -
ft_get - Get links for full text articles (xml, pdf) -
ft_links - Extract text from articles / convert formats -
ft_extract - Collect all texts into a data.frame -
ft_table
Previously supported use cases, extracted out to other packages:
- Collect bits of articles that you actually need - moved to package
pubchunks - Supplementary data from papers has been moved to the
suppdata
It's easy to go from the outputs of ft_get to text-mining packages such as
tm and quanteda
Data sources in fulltext include:
- Crossref - via the
rcrossrefpackage - Public Library of Science (PLOS) - via the
rplospackage - Biomed Central
- arXiv - via the
aRxivpackage - bioRxiv - via the
biorxivrpackage - PMC/Pubmed via Entrez - via the
rentrezpackage - Scopus - internal tooling
- Semantic Scholar - internal tooling
- Many more are supported via the above sources (e.g., Royal Society Open Science is available via Pubmed)
- We will add more, as publishers open up, and as we have time...See the issues
Authentication: A number of publishers require authentication via API key, and some even more
draconian authentication processes involving checking IP addresses. We are working on supporting
all the various authentication things for different publishers, but of course all the OA content
is already easily available. See the Authentication section in ?fulltext-package after
loading the package.
We'd love your feedback. Let us know what you think in the issue tracker
Installation
Stable version from CRAN
install.packages("fulltext")
Development version from GitHub
remotes::install_github("ropensci/fulltext")
Load library
library('fulltext')
Interoperability with other packages downstream
Note: this example not included in vignettes as that would require the two below packages in Suggests here. To see many examples and documentation see the package docs and the fulltext manual.
cache_options_set(path = (td <- 'foobar'))
res <- ft_get(c('10.7554/eLife.03032', '10.7554/eLife.32763'), type = "pdf")
library(readtext)
x <- readtext::readtext(file.path(cache_options_get()$path, "*.pdf"))
library(quanteda)
quanteda::corpus(x)
Contributors
- Scott Chamberlain
- Will Pearse
- Katrin Leinweber
Meta
- Please report any issues or bugs.
- License: MIT
- Get citation information for
fulltext:citation(package = 'fulltext') - Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.