jstor
jstor copied to clipboard
Import journal data from DfR (JSTOR)
jstor: Import and Analyse Data from Scientific Articles
Author: Thomas Klebel
License:
GPL v3.0
The tool Data for Research (DfR) by JSTOR
is a valuable source for citation analysis and text mining. jstor
provides functions and suggests workflows for importing datasets from
DfR. It was developed to deal with very large datasets which require an
agreement, but can be used with smaller ones as well.
Note: As of 2021, JSTOR has moved changed the way they provide data to a new
platform called Constellate. The package jstor
has
not been adapted to this change, and might therefore only be used for legacy
data that was optained from the old DfR platform.
The most important set of functions is a group of jst_get_*
functions:
-
jst_get_article
-
jst_get_authors
-
jst_get_references
-
jst_get_footnotes
-
jst_get_book
-
jst_get_chapters
-
jst_get_full_text
-
jst_get_ngram
All functions which are concerned with meta data (therefore excluding
jst_get_full_text
and jst_get_ngram
) operate along the same lines:
- The file is read with
xml2::read_xml()
. - Content of the file is extracted via XPATH or CSS-expressions.
- The resulting data is returned in a
tibble
.
Installation
To install the package use:
install.packages("jstor")
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("ropensci/jstor")
Usage
In order to use jstor
, you first need to load it:
library(jstor)
library(magrittr)
The basic usage is simple: supply one of the jst_get_*
-functions with
a path and it will return a tibble with the extracted
information.
jst_get_article(jst_example("article_with_references.xml")) %>% knitr::kable()
file_name | journal_doi | journal_jcode | journal_pub_id | journal_title | article_doi | article_pub_id | article_jcode | article_type | article_title | volume | issue | language | pub_day | pub_month | pub_year | first_page | last_page | page_range |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
article_with_references | NA | tranamermicrsoci | NA | Transactions of the American Microscopical Society | 10.2307/3221896 | NA | NA | research-article | On the Protozoa Parasitic in Frogs | 41 | 2 | eng | 1 | 4 | 1922 | 59 | 76 | 59-76 |
jst_get_authors(jst_example("article_with_references.xml")) %>% knitr::kable()
file_name | prefix | given_name | surname | string_name | suffix | author_number |
---|---|---|---|---|---|---|
article_with_references | NA | R. | Kudo | NA | NA | 1 |
Further explanations, especially on how to use jstor’s functions for importing many files, can be found in the vignettes.
Getting started
In order to use jstor
, you need some data from DfR. From the main
page you can create a dataset by searching
for terms and restricting the search regarding time, subject and content
type. After you created an account, you can download your selection.
Alternatively, you can download sample
datasets with documents
from before 1923 for the US, and before 1870 for all other countries.
Supported Elements
In their technical specifications, DfR lists fields which should be reliably present in all articles and books.
The following table gives an overview, which elements are supported by
jstor
.
Articles
xml -field |
reliably present | supported in jstor |
---|---|---|
journal-id (type=“jstor”) | x | x |
journal-id (type=“publisher-id”) | x | x |
journal-id (type=“doi”) | x | |
issn | x | |
journal-title | x | x |
publisher-name | x | |
article-id (type=“doi”) | x | x |
article-id (type=“jstor”) | x | x |
article-id (type=“publisher-id”) | x | |
article-type | x | |
volume | x | |
issue | x | |
article-categories | x | |
article-title | x | x |
contrib-group | x | x |
pub-date | x | x |
fpage | x | x |
lpage | x | |
page-range | x | |
product | x | |
self-uri | x | |
kwd-group | x | |
custom-meta-group | x | x |
fn-group (footnotes) | x | |
ref-list (references) | x |
Books
xml -field |
reliably present | supported in jstor |
---|---|---|
book-id (type=“jstor”) | x | x |
discipline | x | x |
call-number | x | |
lcsh | x | |
book-title | x | x |
book-subtitle | x | |
contrib-group | x | x |
pub-date | x | x |
isbn | x | x |
publisher-name | x | x |
publisher-loc | x | x |
permissions | x | |
self-uri | x | |
counts | x | x |
custom-meta-group | x | x |
Book Chapters
xml -field |
reliably present | supported in jstor |
---|---|---|
book-id (type=“jstor”) | x | x |
part_id | x | x |
part_label | x | x |
part-title | x | x |
part-subtitle | x | |
contrib-group | x | x |
fpage | x | x |
abstract | x | x |
Code of conduct
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Citation
To cite jstor
, please refer to citation(package = "jstor")
:
Klebel (2018). jstor: Import and Analyse Data from Scientific Texts. Journal of
Open Source Software, 3(28), 883, https://doi.org/10.21105/joss.00883
Acknowledgements
Work on jstor
benefited from financial support for the project
“Academic Super-Elites in Sociology and Economics” by the Austrian
Science Fund (FWF), project number “P 29211 Einzelprojekte”.
Some internal functions regarding file paths and example files were
adapted from the package
readr
.