Paging
various paging patterns:
- using query parameters
-
limit&offset- specify no. desired, and which record to start at -
per_page&page- specify no. desired, and which page to return (page size variable)
-
- using link headers
- headers return some combination of
next,first,last, orprevlinks - if these provided, in all cases should probably use these - Github recommends that at least
- headers return some combination of
examples
-
github:
- params:
page,per_page - strategy: GH suggests using link headers to do paging b/c sometimes paging is based on SHA's instead of page numbers
- params:
-
GBIF
- params:
limit,offset - strategy: page via query parameters
- params:
-
Crossref
- params:
rows,offset
- params:
-
idigbio
- params:
limit,offset(Elasticsearch backed, but not exposed directly)
- params:
-
vertnet
- params:
limit,cursor
- params:
-
Tropicos
- params:
pagesize,startrow
- params:
-
NOAA NCDC v2
- params:
limit,offset
- params:
-
CKAN API
- params:
limit,offset
- params:
-
Berkeley Ecoengine
- params:
page_size,page
- params:
-
iNaturalist
- params:
per_page,page
- params:
-
DataCite
- params:
rows,start(solr backed)
- params:
-
PLOS Search API
- params:
rows,start(solr backed)
- params:
-
Europeana API
- params:
rows,start(solr backed)
- params:
-
ORCID API
- params:
rows,start(solr backed)
- params:
-
DPLA API
- params:
page_size,page(Elasticsearch backed)
- params:
-
Twitter search API
- params:
count(that's it, AFAICT, not sure this is accurate)
- params:
-
Enigma
- params:
limit,page
- params:
approach
Automagically figure out what params to pass and their values given user input. We'll need some user input:
- name of query parameters
- how many records they want
- what record to start at
- maximum records allowed (if known)
Things we can figure out automatically
- whether API uses link headers or not (just look for link headers)
- if a cursor is used (e.g., Vertnet API) look for a cursor
This probably will require using something that can maintain state, like an R6 class
paging notes
per_page/page interface --- assuming max is > 20 limit = 20 -> per_page = 5 offset = 15 -> page = 4 chunk_size = 5 stop -> 20 (so, get 4 pages)
limit/offset interface --- assuming max is > 20 limit = 20 offset = 0 chunk_size = 5 stop -> 20 (so, get 4 pages ({limit = 5, offset = 0}, {limit = 5, offset = 5}, etc.))
moving to next milestone, trying to get MVP out #10