hoad
hoad copied to clipboard
decide deployment strategy
This might be too early now, though deciding this sometime soon might help with some other decisions (especially reproducibility and testing).
If we end up using a shiny app (and not #24), I see three possibilites:
- RStudio Connect, hosted on-prem at GWDG. License is expensive, but has lots of good stuff, especially if we want to roll this out to external customers.
- shinyapps.io (status quo, easy to do, though somewhat limited reproducibility and no support for plumber, if needed).
- roll-your-own based on shiny server open source, maybe via shinyproxy
This decision, if it ever comes up, is a bit awkward (potential conflict of interest) for me to be involved in, because I have done contracting work for RStudio in the past, and may do so again. I have also been in close contact with the sales and solutions engineering team, though have never discussed my work for subugoe.
Sunlight is the best disinfectant as they, so here we go.
current idea is to deploy to azure as a container, if that works in #29 this is done.
I've now checked the big cloud vendors for suitable products for hosting our shiny app.
I will check with the GWDG later, but first wanted to figure out the maximum convenience and lowest cost we could get on the open market. We can then compare that with whatever the GWDG can do, and decide.
(I've actually only checked Google Cloud Platform (GCP) and Microsoft Azure (Azure), not Amazon Web Services (AWS), though the offering is probably similar).
These are our requirements:
- We need to supply our app as a docker image, just with suitable
EXPOSEandENTRYPOINT(or whatever). As of 0e870a7adf7feb7a11fc88802f544dbc50a08435, that's done. - Shiny is a stateful app (i.e. different requests in the same session must be routed to the same R process in the same container). (Full-on stateless serverless products mostly won't work).
- Shiny requires websockets (kind of related to its stateful nature).
- Given all that, we want as little maintenance as possible
- no (web) server maintenance (other than the shiny server inside container)
- no kubernetes
- (horizontal) scaling should be as automatic, ideally down to 0.
I've tested these products against the above requirements:
- [x] Azure Web Apps for Containers (aka Web Apps for Linux)
- works: http://hoad.azurewebsites.net (it's slow but that can be fixed)
- does not scale down to 0
- costs are probably around ~EUR50/month, though this (App Service Plan) can be shared with other users.
- [ ] GCP App Engine Flexible
- pending
- [x] GCP Cloud Run
- does not work 😒 (not suitable for stateful apps, much like other serverless offerings, and does not do websockets)
- would be otherwise ideal, because it actually scales down to 0 and also deploys/scales much faster than other solutions
- if we end up deploying some R API (such as via {plumber}) as per https://github.com/subugoe/hoad/issues/48 this is probably the way to go.
Current thinking is to use GCP App Engine Flexible, because it seems pretty good and we're already invested in GCP.
Just for future reference: a couple of words on scaling.
- Shiny users (i.e. people using the app at the same time) can share a single R process.
- (In RStudio's pro products, you can also have one machine span several R processes, and load balance the connections between these processes
- R is a single-threaded application, so one users (expensive-ish) request will block another users request, even if it's just a small UI thing. (This requires careful design and async patterns whenever possible). This can quickly make a shiny app unuseable. A ballpark figure of useable concurrency is probably ~5 users.
- The alternative is then horizontal scaling, i.e. just adding more machines, each of which can be relatively cheap.
Hence the emphasis on (Kubernetes-based) Container-as-a-Service offerings. (And also, we want to use the container we built for reproducibility).
and one last general thought: obviously RStudios Pro Products (especially RStudio Connect in this context), self-hosted at GWDG or in a public cloud VM are also an option. I'm not investigating this much further at this point, because that would be a very substantial investment in both funds and maintenance.
I'm especially concerned about the maintenance, which would be pretty significant and probably worth it only for a larger team of intensive users.
re: Google Cloud Run, there still seems to be some debate:
- https://github.com/randy3k/shiny-cloudrun-demo/issues/1
- https://github.com/MarkEdmondson1234/googleCloudRunner/issues/35
- https://github.com/rstudio/shiny/issues/2455
App engine flexible also doesn’t scale to zero last I checked. I did an RStudio build that perhaps can help https://github.com/MarkEdmondson1234/appengine-rstudio
seems like with the currently (somewhat bloated container/app) we can get away with this:
resources:
cpu: 1
memory_gb: 0.75
disk_size_gb: 10
vCPU and disk_size_gb is at min already.
this is largely done now, documentation ongoing at https://github.com/subugoe/shinycaas Keeping open until I can point to that documentation