Reduce server costs to fit easily within free budget => Cron Job overhaul
Summary
Each server costs $2-3 / day to run. This could cause a surprise for someone setting up their own servers during development. It would be nice for the usage of several servers (or all of them) to fit well within the free budget for server costs.
I believe that most of that cost comes from the refreshCaseStats cron job that runs every five minutes. Otherwise the servers should be quiet generating no costs until they get a request.... a lazy cron job that's only active when people are using the server.
Modify getCaseStats method:
https://github.com/WorldHealthOrganization/app/blob/master/server/appengine/src/main/java/who/WhoServiceImpl.java#L70-L95
if (data older than 1 hour or never been fetched) {
if (isProduction() && has been fetched before) {
send alert as data shouldn't be this stale, likely an issue with COVID-19 dashboard
}
refreshCaseStats(); // blocking call, run at most every 5 minutes
// need to make sure that stale data doesn't create cascade of `refreshCaseStats` requests
// good for testing system, avoid server returning stale
} else if (data older than 5 minutes) {
schedule(refreshCaseStats);
// non-blocking call scheduled, run at most every 5 minutes
}
serve data from cache
Scheduling non-duplicative work means that there should be only one outstanding request to refreshCaseStats at a time. For production this will get updated every 5 minutes. For development, the servers will go quiet when there's no active requests reducing costs to almost zero.
https://github.com/WorldHealthOrganization/app/blob/master/server/appengine/src/main/webapp/WEB-INF/cron.yaml
Checklist:
- [x] Searched the existing issues to ensure you are not creating a duplicate.
- [x] Followed the Contributor Guidelines.
@matthewblain - not a priority but nice to have for others doing server development so they don't get any surprise bills
This should be easy to implement by: 1: On every request, enqueue a Task Queue entry (targeting the same URL that the cron job uses), using a named task where the name is (a hash of) the desired refresh time, e.g. time_t % 5 minute interval. 2: (Optional) Use the cron job with a timing of to 1x/day or something so there's always some data.
This could be done in all environments, or configurable somehow.
@matthewblain - I'd be open to either approach as long as all servers provide fresh data and the non production systems easily sit inside the free GCP budget.