Creates Feature Flag to Prevent Removal of an Allocated Game Server from the Allocation Cache
What type of PR is this?
/kind feature
What this PR does / Why we need it:
Removing a game server from the allocation cache when using Counters and Lists causes the allocator to allocate more game servers than needed. This PR adds a feature flag to prevent the removal of an allocated game server from the cache.
Because the game server is not removed from the cache, and the intent of packing a game server until a Counter or List reaches capacity involves wanting to allocate to the same game server, there will be resource (write) conflicts. In the sample allocation request we use a short allocationBatchWaitTime time, as well as throttling of current requests to control the rate of allocation and reduce the number of resource conflicts. These numbers should be tweaked during testing to find the best throughput of allocations for a particular game without overloading the Kubernetes control plane.
Which issue(s) this PR fixes:
Working on #3992
Special notes for your reviewer:
- This draft includes a sample fleet, sample allocation request, modification of the
allocationBatchWaitTimeof the dev installation, and debug statements that will not be included in the final PR.
Build Failed :sob:
Build Id: fb7e419e-a636-4dcd-bf23-4702d7096718
Status: FAILURE
To get permission to view the Cloud Build view, join the agones-discuss Google Group.
I think #4159 might at least be part of what you actually want.
@dmorgan-blizz FYI this is one proposal for your issue #3992. Let us know how this performs.
FWIW the idea here is not to bring this feature to stable, it's to leave it as a dev or alpha feature flag (off by default) until, or if, there's a more involved solution around a shared allocator cache. This feature only makes sense if the user has a Counter/List, and they want to fill it to capacity before allocating the next game server.
The point of not removing the game server from the list here is so that it will force all allocators to bin pack. If the game server is removed and then placed back in the list you still end up allocating additional unnecessary game servers. It's a trade off of a slower cap on allocation rate vs. not well bin packed.
FWIW the idea here is not to bring this feature to stable, it's to leave it as a dev or alpha feature flag (off by default) until, or if, there's a more involved solution around a shared allocator cache.
Yeah, but the premise isn't sound for what you are stating anyway, as least from my perspective.
This feature only makes sense if the user has a Counter/List, and they want to fill it to capacity before allocating the next game server.
If you have multiple Allocators, this won't do that though. You're always going to hit the disparate cache issue anyway.
The point of not removing the game server from the list here is so that it will force all allocators to bin pack.
Yeah, but that won't work -- if you don't remove it from the list, if there is an attempt to allocate it again, it will fail with a conflict, and therefore won't bin pack.
If the game server is removed and then placed back in the list you still end up allocating additional unnecessary game servers. > It's a trade off of a slower cap on allocation rate vs. not well bin packed.
Not if you (a) put back the freshly allocated game server and (b) put it back in the correct sorting order (binary search is pretty fast!).
This doesn't solve the multi-allocator problem, and there is likely some race conditions in here, but it actually gets you closer to your intended goal than what you have now, which I don't think actually does what you think it does.
Shall we close this and clean things up a little?
Unfortunately, https://github.com/googleforgames/agones/pull/4159 did not solve our allocation problems ;( we see very similar behavior on 1.49 as we did on 1.44, where more gameservers are spun up than needed. For instance
We have not had a chance to test these additional changes, but it appears some solution that caches outside of the allocator process itself is needed (we're running what I believe is the standard min 3 allocators)
@dmorgan-blizz have you seen the conversations and work happening on #4190 ? That should replace this PR.
This works as a simple bin packing fix when used with a significantly lower allocationBatchWaitTime. With with tweaks to the allocationBatchWaitTime and client-side throttling it can handle many requests (can't find my old test runs at the moment with exact numbers) with low conflicts.
That being said, it looks like miai10 has some time to work on a the complex solution. We can close, and if you need something sooner @dmorgan-blizz you can test on this PR even if it's closed, and reopen if needed.
I hadn't seen https://github.com/googleforgames/agones/issues/4190 that sounds great! I'm fine with waiting for that :)