dub-registry Reliability of code.dlang.org

With the growing importance of code.dlang.org we should start to think about reliability. Do you already monitor uptime? How long would it take to move this to another server if the current one is dying? At which size will we need a failover solution and db replication?

Given the amount of work and the little server resources we need, this might be a use-case for managed hosting.

Sep 23 '16 18:09 MartinNowak

Do you already monitor uptime?

I get an e-mail whenever something is not accessible. Currently the only regular reason for this is a failure in DMDs exception handling code, which happens in the reverse proxy that sits in front of the registry process. Uptime is typically >99.9% - the usual CI build failures seem to be caused by GitHub failures instead.

How long would it take to move this to another server if the current one is dying?

The process would be to copy the last DB backup and to run the dub-registry process (+ setup a reverse proxy rule in the main web server), so ideally it should be a matter of minutes.

At which size will we need a failover solution and db replication?

I think replication, at least in a simple master-slave setting, should really be done now, even if it wouldn't help to combat GitHub downtime. Once we grow considerably (factor of 10?), I'd say DB level replication and running multiple dub-registry instances behind a load balancer starts to make sense. Right now it would just produce administration/cost overhead with no practical benefit over a simple master/slave solution, which also has the benefit of offering the possibility for federation.

Sep 25 '16 13:09 s-ludwig

now that dub is deployed on new servers and doesn't crash that much anymore I think we can close this

Jun 13 '18 08:06 WebFreak001