hoad icon indicating copy to clipboard operation
hoad copied to clipboard

decide deployment strategy

Open maxheld83 opened this issue 5 years ago • 9 comments
trafficstars

This might be too early now, though deciding this sometime soon might help with some other decisions (especially reproducibility and testing).

If we end up using a shiny app (and not #24), I see three possibilites:

  1. RStudio Connect, hosted on-prem at GWDG. License is expensive, but has lots of good stuff, especially if we want to roll this out to external customers.
  2. shinyapps.io (status quo, easy to do, though somewhat limited reproducibility and no support for plumber, if needed).
  3. roll-your-own based on shiny server open source, maybe via shinyproxy

maxheld83 avatar Feb 02 '20 11:02 maxheld83

This decision, if it ever comes up, is a bit awkward (potential conflict of interest) for me to be involved in, because I have done contracting work for RStudio in the past, and may do so again. I have also been in close contact with the sales and solutions engineering team, though have never discussed my work for subugoe.

Sunlight is the best disinfectant as they, so here we go.

maxheld83 avatar Feb 02 '20 11:02 maxheld83

current idea is to deploy to azure as a container, if that works in #29 this is done.

maxheld83 avatar May 28 '20 15:05 maxheld83

I've now checked the big cloud vendors for suitable products for hosting our shiny app.

I will check with the GWDG later, but first wanted to figure out the maximum convenience and lowest cost we could get on the open market. We can then compare that with whatever the GWDG can do, and decide.

(I've actually only checked Google Cloud Platform (GCP) and Microsoft Azure (Azure), not Amazon Web Services (AWS), though the offering is probably similar).

These are our requirements:

  1. We need to supply our app as a docker image, just with suitable EXPOSE and ENTRYPOINT (or whatever). As of 0e870a7adf7feb7a11fc88802f544dbc50a08435, that's done.
  2. Shiny is a stateful app (i.e. different requests in the same session must be routed to the same R process in the same container). (Full-on stateless serverless products mostly won't work).
  3. Shiny requires websockets (kind of related to its stateful nature).
  4. Given all that, we want as little maintenance as possible
  • no (web) server maintenance (other than the shiny server inside container)
  • no kubernetes
  • (horizontal) scaling should be as automatic, ideally down to 0.

I've tested these products against the above requirements:

  • [x] Azure Web Apps for Containers (aka Web Apps for Linux)
    • works: http://hoad.azurewebsites.net (it's slow but that can be fixed)
    • does not scale down to 0
    • costs are probably around ~EUR50/month, though this (App Service Plan) can be shared with other users.
  • [ ] GCP App Engine Flexible
    • pending
  • [x] GCP Cloud Run
    • does not work 😒 (not suitable for stateful apps, much like other serverless offerings, and does not do websockets)
    • would be otherwise ideal, because it actually scales down to 0 and also deploys/scales much faster than other solutions
    • if we end up deploying some R API (such as via {plumber}) as per https://github.com/subugoe/hoad/issues/48 this is probably the way to go.

Current thinking is to use GCP App Engine Flexible, because it seems pretty good and we're already invested in GCP.

maxheld83 avatar Jun 09 '20 13:06 maxheld83

Just for future reference: a couple of words on scaling.

  • Shiny users (i.e. people using the app at the same time) can share a single R process.
    • (In RStudio's pro products, you can also have one machine span several R processes, and load balance the connections between these processes
  • R is a single-threaded application, so one users (expensive-ish) request will block another users request, even if it's just a small UI thing. (This requires careful design and async patterns whenever possible). This can quickly make a shiny app unuseable. A ballpark figure of useable concurrency is probably ~5 users.
  • The alternative is then horizontal scaling, i.e. just adding more machines, each of which can be relatively cheap.

Hence the emphasis on (Kubernetes-based) Container-as-a-Service offerings. (And also, we want to use the container we built for reproducibility).

maxheld83 avatar Jun 09 '20 13:06 maxheld83

and one last general thought: obviously RStudios Pro Products (especially RStudio Connect in this context), self-hosted at GWDG or in a public cloud VM are also an option. I'm not investigating this much further at this point, because that would be a very substantial investment in both funds and maintenance.

I'm especially concerned about the maintenance, which would be pretty significant and probably worth it only for a larger team of intensive users.

maxheld83 avatar Jun 09 '20 13:06 maxheld83

re: Google Cloud Run, there still seems to be some debate:

  • https://github.com/randy3k/shiny-cloudrun-demo/issues/1
  • https://github.com/MarkEdmondson1234/googleCloudRunner/issues/35
  • https://github.com/rstudio/shiny/issues/2455

maxheld83 avatar Jun 09 '20 14:06 maxheld83

App engine flexible also doesn’t scale to zero last I checked. I did an RStudio build that perhaps can help https://github.com/MarkEdmondson1234/appengine-rstudio

MarkEdmondson1234 avatar Jun 09 '20 21:06 MarkEdmondson1234

seems like with the currently (somewhat bloated container/app) we can get away with this:

resources:
  cpu: 1
  memory_gb: 0.75
  disk_size_gb: 10

vCPU and disk_size_gb is at min already.

maxheld83 avatar Jun 15 '20 19:06 maxheld83

this is largely done now, documentation ongoing at https://github.com/subugoe/shinycaas Keeping open until I can point to that documentation

maxheld83 avatar Jul 29 '20 15:07 maxheld83