unconf18 ROpenSci storage for package caching

Lots of packages need caching. When things get complicated enough, packages may need to access their own eternal data to do their internal stuff. This means caching somewhere, somehow, in a form that is reliable and available. This costs money.

Oh, I don't have any of that; okay, I can't do that package.

And there it ends. How about considering applications to ROpenSci to (financially) support caching via some suitable provider? The flipper package is a case in point. This works at the moment because it only trawls the CRAN_package_db. We would like to extend this to all man/ directories, all non-CRAN packages on github, and many potential other places. This is impossible without some sorta cloudy caching scheme.

Any chance of ROpenSci having an application scheme whereby those with existing ROpenSci packages apply for access to a wee chunk of server space?

Apr 18 '18 08:04 mpadge

I was literally just crafting a proposal for a caching process for drake! I think it's a slightly different use-case than what you are proposing here, but maybe we can combine forces and think of all the caching use-cases and needs. (See #30 )

Apr 18 '18 18:04 ldecicco-USGS

When it comes to caching, I am a huge fan of @richfitz's storr package. It's a general key-value storr with an expanding variety of backends ("drivers"), including storr_rds() and storr_dbi(). Maybe a remote storr driver would help here? Related: http://richfitz.github.io/storr/articles/external.html.

Apr 19 '18 01:04 wlandau

thoughts (chat with @mpadge and @sckott):

remote caching
for people moving between jobs, best to host data with an organization that is longer lived (e.g., ropensci)
e.g. flipper (see above)
scheduling: good, but not as important as the caching itself
in onboarding: could have a checkbox for requesting data caching, and which options (if package accepted then we can take over caching)
cost: would need a way to calculate cost. parts to use: package downloads, number of requests per use to the cache * cost per download from S3
how does authentication work: hash S3 keys to give to people? can we whitelist certain people?
we could require that all jobs are updated via our server with our S3 keys, but then people can't update manually
pulling data from S3 is easy, probably via storr
need more use cases!

May 21 '18 18:05 sckott

unconf18 unconf18 copied to clipboard

ROpenSci storage for package caching

unconf18
unconf18 copied to clipboard