hoad icon indicating copy to clipboard operation
hoad copied to clipboard

write up design principles for hoad

Open maxheld83 opened this issue 5 years ago • 1 comments

it just occurred to me during the call with @kjgarza that it might be a good idea to write down the draft design principles for hoad that we've been talking about.

There are three levels of user/target segmentation, which correspond to three levels of our code.

  1. Distributed in-memory database. This database should be as generic as possible, in the extreme case just duplicating the crossref coverage, but with a lot better performance and arbitrary SQL/dplyr queries.
    • Target: Analysts (us).
    • Code:
      • setup of the database (currently Google BigQuery, maybe Azure Synapse)
      • batch jobs to seed the db with dumps and incremental updates
      • example queries
  2. Domain-specific APIs Opinionated queries against 1 to yield domain-specific data objects (that fit into laptop memories). A set of (multiple!) tidy data frames that make sense for hybrid open access uptake analysis, i.e. make it possible to run the plots/analyses in 3.
    • Target: R users interested in hybrid OA.
    • Code:
      • dplyr/sql queries against 1
      • additional on-client data wrangling
      • assertions and tests
  3. Dashboard Views on the data in 2 to tell answer our business questions.
    • Target: HOAD project stakeholders
    • Code:
      • plots (those are also part of the package proper)
      • dashboard (maybe modules are also part of the package)

maxheld83 avatar Jul 15 '20 13:07 maxheld83

this is just quickly jotted down, should be in the repo somewhere

maxheld83 avatar Jul 15 '20 13:07 maxheld83