agones
agones copied to clipboard
Configurable maximum time without allocation (and shutdown after that time)
Is your feature request related to a problem? Please describe.
This is related to #1782 in that we want a way to reduce costs by not having a large number of idle servers. During development images are pushed all the time and it would be prohibitively expensive to have fleets for every single image tag/version. The reason that the other issue I think doesn't quite fit our use case is that we aren't ever completely sure which version will be needed by a team, and when they'll need it, and so it's hard to know when to set the replicas to 1 or 0.
Our current solution to this is when a match is formed with our matchmaker we try to allocate to a fleet with the correct version using the matchLabels
functionality of GameServerAllocation
. If there's no existing fleet, but there is a matching image in our repository, the service will create a single GameServer object for the client to join once it spins up. The game server has logic to shutdown if no one ends up joining it, to avoid dangling pods.
The problem with this approach is if that shutdown logic somehow fails then you end up with the dangling pod. It's also just not very "clean", having to implement the shutdown logic inside the game server.
Describe the solution you'd like
Would it be possible to configure the maximum amount of time a GameServer can spend not Allocated? Maximum amount of time spent Ready is the obvious state, but I suppose it could also apply to Scheduled, Reserved and RequestReady. I imagine this would work the same as the health configuration, something like:
maxTimeBeforeAllocation:
enabled: false # Enable or disable
initialDelaySeconds: 5 # Initial delay (might not be needed, could just be part of periodSeconds)
periodSeconds: 300 # Number of seconds spent unallocated before the server is shutdown
Describe alternatives you've considered
- Our current implementation isn't awful and has only left a couple of dangling servers, so we could just stick with what we've got. But it would be nice to be cleaner, and this functionality might be useful for others.
- We could have a CronJob that watches for old servers and deletes them. This would involve writing the code for it, adding CI to make the image, managing the deployment etc. Plus it's 1 extra pod. So possible but more work.
One thing that makes me hesitant to build this into the game server spec is that it wouldn't work well with fleets, because the game server would be replaced by the fleet controller, which would just create churn in the system without any real benefit.
But for your use case, where you create individual game servers (and not fleets), another alternative would be to build this into your game server.
The simple game server example can automatically shut down after being allocated for a specified amount of time: https://github.com/googleforgames/agones/blob/main/examples/simple-game-server/main.go#L124
Your scenario would be similar -- you could start a timer once the server transitioned to ready and then abandon it if the server moved from ready -> allocated. If the second transition doesn't happen before the timer expires, then the game server exits.
If you don't want to modify the game server, you could add a second, tiny, container to your pod that's only job would be to perform this self-destruct action. You could have a tiny go binary that just watches for the state transition and calls shutdown when necessary. Then your game server container wouldn't need any extra logic.
If you don't want to modify the game server, you could add a second, tiny, container to your pod that's only job would be to perform this self-destruct action.
I +1 this idea 🙂
Sorry for the late reply.
Yeah we've currently built the functionality into the game server but have had issues with reliability. Because the server wasn't being shut down cleanly there were problems with outstanding HTTP requests and that kind of thing, and we've had to spend time tracking down those problems.
The extra sidecar suggestion is a good one, although similar to the CronJob suggestion in that it's extra work managing the image. We're also considering adding a background task to the service that creates the standalone GameServer to monitor for old GameServer's in the namespace.
The idea with the config suggestion was that it would be disabled by default, so it wouldn't interfere with fleets, and you could just turn it on when creating a GameServer object directly. I assume some sort of validation could even be added to make sure it's not turned on for fleet gamserver specs. But I definitely understand this could be confusing for new users, so would be nice but maybe not the best idea.
It keeps coming up, but I'm feeling like a library of different open source sidecars listed here: https://agones.dev/site/docs/third-party-content/libraries-tools/ would be a great way to (a) explore these ideas and (b) grow the ecosystem of tooling around Agones.
Since it strongly seems like this won't be part of core Agones, I'm going to mark this as stale and close in a few weeks if there are no objections.
Would still love it as a third party ecosystem project, but would like to clean this up since it won't be in Agones.
Coming back around and closing, since no objection since marking as stale.