sidewall
sidewall copied to clipboard
Sidewall is a Python library for interacting with the Dimensions search API.
Sidewall
Sidewall is a package for interacting with the Dimensions search API. It provides object classes for Dimensions entities, fetches data incrementally, caches results, copes with rate limits, and more, to make working with Dimensions in Python more natural. "Sidewall" is a loose acronym for Simple Dimensions wrapper client library.
Authors: Michael Hucka
Repository: https://github.com/caltechlibrary/sidewall
License: BSD/MIT derivative โ see the LICENSE file for more information
๐ Log of recent changes
Version 1.0.1: This is a significant bug-fix release.
- Fixed serious bugs in creating
Researcherobjects fromAuthorobjects. - Fixed bugs in setting
current_organizationonPerson,AuthorandResearcherobjects - Fixed bugs setting
affiliationsonResearcherwhen derived fromAuthorobjects - Updated examples in the top-level README file
- Started a CHANGES file
Table of Contents
- Introduction
- Installation instructions
- Using Sidewall
- Basic setup and use
- Basic principles of running queries
- Data mappings
Person, with subclassesAuthorsandResearchersOrganizationPublicationGrantJournal,Category,City,Country,State- Unsupported Dimensions data types
- Getting help and support
- Acknowledgments
- Copyright and license
โ Introduction
Dimensions offers a networked API and search language (the DSL). However, interacting with the DSL currently requires sending a search string to the Dimensions server, then interpreting the JSON results and handling various issues such as iterating to obtain more than 1000 values (which requires the use of multiple queries), staying within API rate limits, and more. Sidewall ("Simple Dimensions wrapper client library") provides a higher-level interface for working more conveniently with the Dimensions DSL and network API. Features of Sidewall include:
- object classes for different Dimensions data entities
- lazy object values filled in automatically behind the scenes
- results iterator fetches data over the net as needed
- automatic caching of search results for speed and efficiency
- automatic throttling to keep within API rate limits
โบ Installation instructions
The following is probably the simplest and most direct way to install this software on your computer:
sudo python3 -m pip install git+https://github.com/caltechlibrary/sidewall.git --upgrade
Alternatively, you can clone this GitHub repository and then run setup.py:
git clone https://github.com/caltechlibrary/sidewall.git
cd sidewall
sudo python3 -m pip install . --upgrade
โถ๏ธ Using Sidewall
Sidewall is meant to be used from other programs; it does not provide a standalone command-line interface or graphical user interface. At this time, Sidewall only supports certain kinds of Dimensions queries as discussed below.
Basic setup and use
To use Sidewall, import the package and the symbol dimensions in your Python code:
import sidewall
from sidewall import dimensions
In case of problems, it may be useful to turn on debugging in Sidewall to see everything that is happening behind the scenes. You can do that by using set_debug() after importing Sidewall:
sidewall.set_debug(True)
To run queries, you will need first to have an account with Dimensions. There are multiple ways of supplying user credentials to Sidewall. The most secure and more convenient way is to invoke the login() method without any arguments:
dimensions.login()
When done this way, Sidewall will use the operating system's keyring/keychain functionality (via keyring) to get the user name and password. If the information does not exist from a previous call to dimensions.login(), Sidewall will ask you for the user name and password interactively, and then store it in the keyring/keychain for next time.
If asking the user for credentials interactively on the command line is unsuitable for the application you are writing, you can also supply a user name and password to the login() method as keyword arguments:
dimensions.login(username = 'somelogin', password = 'somepassword')
Basic principles of running queries
Sidewall defines a method, query(), which you can use to run a search in Dimensions and get back results. The method takes a single argument, a string. Here is an example:
results = dimensions.query('search publications for "SBML" return publications')
The form of the search query string that Sidewall can use is limited in ways described shortly. The query() method returns an object that acts as a Python iteratorโyou can iterate over the results, use len(), and do other operations.
The items returned by the iterator will be Sidewall objects of the kind discussed in the section below on Data mappings. The specific classes of objects returned will correspond to the type of record expressed in the tail end of the query handed to query(). For example, a query that ends in return publications will produce Sidewall Publications objects; a query that ends in return researchers will produce Sidewall Researcher objects; and so on.
Sidewall currently puts the following limitations on the form of the query search string:
- it must begin with
search - it must end with
return publications,return researchers, orreturn grants - it must only return a single type of thing (i.e., researchers or publications or grants)
- it must not put facet specifiers or limits on the returned results
- it must not use aggregation or other advanced DSL features
The following is a complete example of using Sidewall to search for publications containing thes string "SBML", and then printing the year and DOI for each such publication found:
import sidewall
from sidewall import dimensions
dimensions.login()
results = dimensions.query('search publications for "SBML" return publications')
print('Total found: {}'.format(len(results)))
for pub in results:
print('{}: {}'.format(pub.year, pub.doi))
Data mappings
Sidewall defines object classes such as Researcher, Publication, and a few others to represent the different types of entities returned as the results of a Dimensions search query. Sidewall's objects attempt to smooth over some of the confusing aspects of the data representations in Dimensions by providing single objects that consolidate different fields and facets of the same underlying "thing". Further, the fields of an object sometimes are not available from a given query Dimensions performed by the user but may be available if a different kind of query is performed; Sidewall uses this knowledge in some cases to expand object field values automatically and behind the scenes as needed.
The following data classes are defined by Sidewall at this time; note that this is not all the types of data that Dimensions provides today, but future work may improve Sidewall's coverage.
Person, with subclassesAuthorsandResearchersOrganizationPublicationJournalGrant- several very simple objects:
Category,City,Country,State
Person
Dimensions doesn't expose an underlying base class for people; instead, it returns unnamed data structures that basically refer to people in different contexts. Sidewall currently understands two such contexts: authors of publications (when a query uses return publications), and "researchers" (when a query uses return researchers or objects such as Grant contain "researchers" as a data field). Sidewall introduces a parent class called Person because the objects in these two contexts are so similar, and provides two derived classes: Author and Researcher. Both of the derived classes have the same fields. The distinction provided by the derived classes is necessary because the list of affiliations for an Author is relative to a particular publication and may not be all the affiliations that a person has. Thus, affiliations for authors must be understood in the context of a particular search for publications. The use of two classes indicates the context, so that callers can correctly interpret the list of affiliations.
โโโโโโโโโโโโโโโโ
โ Person โ
โโโโโโโโโโโโโโโโ
^
โโโโโโโโโโโดโโโโโโโโโโโ
โโโโโโโโโดโโโโโโโ โโโโโโโโดโโโโโโโโ
โ Author โ โ Researcher โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
The following table describes the fields and how they relate to values returned from Dimensions:
| Field | Type | In return researchers? |
In return publications? |
In return grants? |
Exp.? |
|---|---|---|---|---|---|
affiliations |
[Organization,ย ...] |
via research_orgs |
โ | โ | โ |
current_organization |
Organization |
n | via current_organization_id |
n | โ |
first_name |
string | โ | โ | โ | n |
middle_name |
string | n | n | โ | n |
id |
string | โ | as researcher_id |
โ | n |
last_name |
string | โ | โ | โ | n |
orcid |
string | as orcid_id |
โ | orcid_id |
โ |
role |
string | n | n | โ | n |
("Exp." โ filled or expanded by Sidewall via search if needed.)
The affiliations field in Sidewall's Person (and consequently Author and Researcher) is a list of Organization class objects (see below). Although affiliations as returned by Dimensions are sparse when using a query that ends with return researchers (they consist only of organization identifiers), Sidewall hides this by providing complete Organization objects for the affiliations field of a Person, and using behind-the-scenes queries to Dimensions to fill out the organization info when the object field values are accessed. Thus, calling programs do not need to do anything to get organization details in a result regardless of whether they use return publications or return researchersโSidewall always provides Organization class objects and handles getting the field values automatically.
To make data access more uniform, Sidewall also replaces the field current_organization_id (which in Dimensions is a string, the identifier of an organization) with the field current_organization. Its value is an Organization object corresponding to the organization whose identifier is found in current_organization_id.
Author class objects are returned when returning publication results, and in those cases, the list of a person's affiliations will reflect their affiliations with respect to a particular publication. However, sometimes it's convenient to get more information about an author, such as the complete list of affiliations that Dimensions has for the person in question. Sidewall allows you to create a Researcher object out of an Author object for that reason. Here is an example to illustrate the differences between authors and researchers and how you can convert the former to the latter:
>>> import sidewall
>>> from sidewall import dimensions, Researcher
>>> dimensions.login()
>>> pubs = dimensions.query('search publications in title_only for "SBML" where year=2003 return publications')
>>> pub = next(pubs)
>>> author1 = pub.authors[0]
>>> author1
<Author ur.0665132124.52>
>>> author1.affiliations
[]
>>> researcher1 = Researcher(author1)
>>> researcher1.affiliations
[<Organization grid.20861.3d>, <Organization grid.10392.39>, <Organization grid.214458.e>]
Finally, note that the field role is present for Researcher objects listed only in the context of Grant results. Its value is not filled in other contexts.
Organization
Sidewall uses the object class Organization to represent an organization in results returned by Dimensions. In Sidewall, the set of fields possessed by an Organization is the union of all fields that Dimensions provides in different contexts for organizations. The following table describes the fields and how they relate to values returned from Dimensions:
| Field | Type | In "return research_orgs"? | In "return publications"? | Sidewall filled? |
|---|---|---|---|---|
acronym |
string | โ | n | โ |
city |
string | n | โ | n |
city_id |
string | n | โ | n |
country |
string | n | โ | n |
country_code |
string | n | โ | n |
country_name |
string | โ | n | โ |
id |
string | โ | โ | n |
name |
string | โ | โ | n |
state |
string | n | โ | n |
state_code |
string | n | โ | n |
Dimensions returns different field values in different contexts. For example, the information about organizations included in an author's affiliation list in a publication is somewhat different from what is provided if a search ending in return research_orgs is used. Sidewall makes the assumption that an organization with a given organization identifier ("grid id") is the same organization no matter the context in which it is mentioned in a search result, and so Sidewall smooths over the field differences and, as with Researcher and Author, queries Dimensions behind the scenes to get missing values when it can (and when they exist).
Publication
The Publication object class is mostly unchanged from the Dimensions publication entity, but in Dimensions, different fields are exposed depending on the type of publication and whether fieldset modifiers are being used. (The available fieldsets for publications are basics, extras, and book.) Sidewall's Publication object class contains all possible fields, but the values of some fields may not be filled in depending on the type of publication in question. For example, journals will not have a value for book_doi. The following table describes the fields in Publication objects:
| Field | Type | In return publications? |
|---|---|---|
altmetric |
string | โ |
authors |
[Author, ...] |
via author_affiliations |
author_affiliations |
[Author, ...] |
via author_affiliations |
book_doi |
string | โ |
book_series_title |
string | โ |
book_title |
string | โ |
date |
string | โ |
date_inserted |
string | โ |
doi |
string | โ |
field_citation_ratio |
string | โ |
id |
string | โ |
issn |
string | โ |
issue |
string | โ |
journal |
Journal |
โ |
linkout |
string | โ |
mesh_terms |
string | โ |
open_access |
string | โ |
pages |
string | โ |
pmcid |
string | โ |
pmid |
string | โ |
proceedings_title |
string | โ |
publisher |
string | โ |
references |
string | โ |
relative_citation_ratio |
string | โ |
research_org_country_names |
string | โ |
research_org_state_names |
string | โ |
supporting_grant_ids |
string | โ |
times_cited |
string | โ |
title |
string | โ |
type |
string | โ |
volume |
string | โ |
year |
string | โ |
Sidewall's Publication objects use a list of Author objects to represent authors, and introduce an alias called authors for the field author_affiliations. The latter alias is for convenience and an attempt to bring more intuitiveness to the structure of publications records. (The name author_affiliations in the Dimensions data is potentially confusing because the name suggests it may be a list of organizations rather than a list of authors. Providing a field named authors removes this ambiguity.)
Grant
The Grant object in Sidewall maps directly to the entity representing grants in Dimensions. The fields in Grants are all identical to the Dimensions results, and use lists of other objects where appropriate. For example, the funders field is created as a list of Organization objects.
| Field | Type |
|---|---|
FOR |
[Category, ...] |
FOR_first |
[Category, ...] |
HRCS_HC |
[Category, ...] |
HRCS_RAC |
[Category, ...] |
RCDC |
[Category, ...] |
abstract |
string |
active_year |
[int, ...] |
date_inserted |
string |
end_date |
string |
funder_countries |
[Country, ...] |
funders |
[Organization, ...] |
funding_aud |
float |
funding_cad |
float |
funding_chf |
float |
funding_eur |
float |
funding_gbp |
float |
funding_jpy |
float |
funding_usd |
float |
funding_org_acronym |
string |
funding_org_city |
string |
funding_org_name |
string |
id |
string |
language |
string |
linkout |
string |
original_title |
string |
project_num |
string |
research_org_cities |
[City, ...] |
research_org_countries |
[Country, ...] |
research_org_name |
string |
research_org_state_codes |
[State, ...] |
research_orgs |
[Organization, ...] |
researchers |
[Researcher, ...] |
start_date |
string |
start_year |
int |
title |
string |
title_language |
string |
The Dimensions data fields in grant entities have an anomaly in that funding_org_city is a string, but cities in another field (research_org_cities) are represented as structured objects. The Grant object in Sidewall does not smooth over this inconsistency in its current version, although perhaps it should in a future release.
Journal, Category, City, Country, State
Rounding out the classes implemented in Sidewall are a small number of very simple classes used to store data that Dimensions returns in structured form: Journal, Category, City, Country, State. They are all basically identical, each containing only two static fields having string values. In the case of Journal one of the fields is named differently (title versus name for the others). More specifically, Journal has the following form:
| Field | Type | In return publications? |
|---|---|---|
| id | string | โ |
| title | string | โ |
All of the other classes (Category, City, Country, State) have the following form:
| Field | Type |
|---|---|
| id | string |
| name | string |
Currently unsupported Dimensions data types
As of this version, Sidewall does not offer support for representing Dimensions policy and patent entities. This is purely due to resource constraints and not due to an inherent limitation in the Sidewall design. Future development could easily add new object classes to support these other data entities.
โ Getting help and support
If you find an issue, please submit it in the GitHub issue tracker for this repository.
โบ๏ธ Acknowledgments
The vector artwork of a car tire used as a logo for this repository was created by Flanker. It is licensed under the Creative Commons Attribution 3.0 Unported license.
Sidewall makes use of numerous open-source packages, without which it would have been effectively impossible to develop Sidewall with the resources we had. We want to acknowledge this debt. In alphabetical order, the packages are:
- humanize โ print numbers in a human-friendly format
- keyring โ access the system keyring service from Python
- requests โ an HTTP library for Python
- setuptools โ library for
setup.py - urllib3 โ HTTP client library for Python
- validators โ data validation package for Python
โฎ๏ธ Copyright and license
Copyright (C) 2019, Caltech. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.