aimmo
aimmo copied to clipboard
Aim for zero-downtime in aimmo deployments
What is the problem that you wish to solve? Please describe. Currently, if we want to deploy our Kurono, we have to delete all the running games and then start the game-creator up again to start all the games and workers up. This causes downtime during deployment but also a loss of game_states and avatar states which are currently generated on game pod creation.
Describe the solution you'd like Use Kubernetes Deployments/Stateful sets to bootup a pod with the new version of Kurono whilst the old one is still running. Once it is ready, move the state over to the new version and then delete the old version pod
On shutdown, we persist the game_states and avatar_states. Clients should be notified that they have to disconnect and reconnect to the new pod once it is ready.
Describe alternatives you've considered Keeping the state of the game not in memory but on a shared volume, this means the game pod just becomes a stateless processor. This sounds good but I'm not quite sure what kind of i/o overhead this would have on each turn.
Additional context This is important if we want to be able to deploy regularly without causing disruption to running games (potentially stopping classes from using Kurono is we deploy mid-session)
With any solution, we should look at:
- Whether switching to a new version works
- Record any potential downtime that occurs
Not aiming to do this right now, will revisit later
Using agones and their update strategy look like a good way to do this and reduce the amount of code we have to maintain
Not relevant.