rcrossref icon indicating copy to clipboard operation
rcrossref copied to clipboard

support paid-for metadata plus subscription

Open maxheld83 opened this issue 5 years ago • 1 comments

We're using now using a metadata plus (mdplus) subscription @subugoe and @njahn82 and myself wanted to contribute support for mdplus back upstream.

This is still a very early draft with lots (test, documentation) to be added; just wanted to log this here to show that it's being worked on.

I also have a couple of questions/ideas associated with mdplus support I'd love your feedback on @sckott and @njahn82 before I proceed. If you can, please let me know which of these (if any) you'd find worthwhile and would accept in the same (or separate?) PRs:

  • [ ] change the environment variables crossref_email and (tba.) crossref_plus to all uppercase as per Google's shell styleguide. This is just cosmetic, but I wonder whether it might help people recognise that these are "just" normal environment variables. (The lowercase env var would have to continue to work so as not to break stuff).

  • [ ] export and document get_email() and get_md_plus_token() together (maybe with below rcrossref_header(), and link from the README.md and other places to that documentation, so "politeness" and md-plus "auth" are in one place.

  • [ ] advise users to keep their .Renvirons safe, link to related resources (i.e. using secrets in CI or OS keychain facilities) as well as resources on user/project .Renvirons. Maybe add a sentence that .Renvirons are really just a way to add env vars to R processes. The mdplus crossref token is somewhat sensitive; at least it should not be published.

  • [ ] I noticed that there are a bunch of calls like the below in the codebase: https://github.com/ropensci/rcrossref/blob/4d38a0aa2bb2fa2a27ee31658a92558cc62b3a13/R/cr_citation_count_async.R#L27-L30

    Which now all need get_md_plus_token() as well. While we're at it, what's the best way to factor this out? Maybe add and document an rcrossref_header() with these defaults and run that? Or is there a better, more idiomatic way to declare these defaults with crul? I've only used httr so far, so apologies for my ignorance here.

  • [ ] add to NEWS.md.

  • [ ] add to README.Rmd.

  • [ ] add tests. I'd like to test that when crossref_plus is available, we're actually getting the improved service, though this is an integration test, not a unit test, I guess. (the ropensci/rcrossref repo could store the token as a secret, and the test would be skipped whenever that is unavailable -- that's what I'm currently doing in https://github.com/subugoe/metacheck/commit/2efc2da05d94c6b806f33dcc076f78c996fc64db). I'd like to test this by snapshotting (or parsing and comparing) the response headers with and without mdplus. Among other things (?) at least the rate limits should be reliably higher for mdplus. Is this a good way to test? Again, I'm ignorant about the crul/vcr/webmockr etc HTTP testing ecosystem, so I'd appreciate any pointers.

    If comparing response headers is an acceptable test, is there a good way to capture it from, say cr_abstract('10.1109/TASC.2010.2088091', verbose = TRUE)? I couldn't get ahold of the response header with capture.output(), try as I might, but I'm probably missing something.

    (I also tried benchmarking the clock time with/without mdplus, but that's too flaky as I should've expected).

maxheld83 avatar Dec 08 '20 08:12 maxheld83

Thanks @maxheld83

In general I prefer PR's that are focused and smaller rather than less focused and larger. So I'd prefer unrelated topics in separate PRs

  • Yes, we can change to upper case env vars. I also prefer upper case, I maybe decided on lowercase when I wasn't so sure what was correct. And yes, still need to support both
  • Why export get_email and get_md_plus_token?
  • We can definitely document rcrossref_header(), but what is the use case for exporting it?
  • Sure, sounds good to add more text on keeping secrets safe
  • Yes, we can factor out adding headers. Nothing special about it in crul vs. httr, just that I haven't taken time to do it. Note that not everyone will use md plus, so I'd only include the md plus header if the user has the env var set
  • yes on news and readme
  • tests: sounds good. yes, definitely skip any tests that require mdplus token if that token is not avail. For learning about http testing, check out in progress book https://books.ropensci.org/http-testing/ (let us know if any feedback, @maelle is working on it right now) To see response headers, yes you can use verbose=TRUE, or crul::set_verbose() (sets verbose globally in the R session). I'd probably compare headers in tests using vcr, e.g.,
without_mdplus <- vcr::use_cassette("compare_response_headers_without_mdplus", {
    cr_abstract('10.1109/TASC.2010.2088091')
})
with_mdplus <- vcr::use_cassette("compare_response_headers_with_mdplus", {
    cr_abstract('10.1109/TASC.2010.2088091') # however that's triggered
})
# load the stored on disk http requests
w_o <- yaml::yaml.load_file(without_mdplus$file())
with <- yaml::yaml.load_file(with_mdplus$file()) 
# get to headers like
w_o$http_interactions[[1]]$response$headers
with$http_interactions[[1]]$response$headers
  • One thing to think about it: If a user can use mdplus, do they ever not want to use it? If so, then how would they not use it? If you can't imagine mdplus not being used if one has access then I guess everything is simpler
  • I assume I personally wouldn't be able to test mdplus behavior, correct? I guess we can rely on testing on CI at least if you put your mdplus credentials in the repo
  • Are there any new API routes that are only available to mdplus?

sckott avatar Dec 10 '20 19:12 sckott