Arbitrary Counts and Lists for GameServers, SDKs and Allocation
Objective
With the recent work with Player Tracking, as well as it’s cross over into High Density Game Server support / re-allocation of Allocated GameServers, it seems that to be able to provide arbitrary count values and/or lists of values that are tied to GameServers, much like Player Tracking values are right now, is very useful for a wide variety of use cases.
This feature design’s contention is to replace Player Tracking with a generic way to track general counts as well as lists against a GameServer by an user provided key, as well as with integrated allocation, Fleet scheduling and SDK support, such that it can support the use case of player tracking as it currently stands, but also use cases like multi-tenant room server counting, or any other game specific value that could be utilised for a custom integration.
An added benefit would be that simple gauge data as metrics would be exposed as well, although we may not want to advocate this as a blessed path for only exporting metrics, if not taking advantage of other functionality.
This feature would be built behind the GameServerCountsAndLists feature gate and should be on-par with PlayerTracking before the PlayerTracking functionality is removed.
Requirements
- Define on a
GameServera set of attached lists and/or counters attached to an arbitrary, user supplied key - Out of scope: The ability to add/edit or delete
GameServerkeys for counters and list at runtime. Keys should be explicitly predefined with theGameServerdefinition to put some limits on what can be stored against fuetcd and ideally avoid overloading the Kubernetes API control plane (although we will need strong documentation about this, as this will definitely put extra load on the control plane).
Counters
- Counters can have an initial value (0 is the default).
- Counters can have a set capacity (maximum value), but by default are 0 (max of int64).
- We deliberately are using the term “capacity” across both list and counters to be consistent between the two pieces of functionality..
- Incrementation / decrementation below 0 or above set capacity will be a no-op. I.e. No operations to increment/decrement a counter will error.
- Counters must be >= 0
- SDK capability to atomically get, increment, decrement and set a counter local value, which is then set to the backing
GameServerCRD status.- Note: There are race conditions we can’t avoid between SDK updates and Allocation or external updates.
- SDK capability to change the maximum value for a counter.
- The ability to atomically increment or decrement counts on allocation
- If a user wants to ensure there is room for the increment or decrement, that should be explicitly included in the filter options (i.e. decrement by one, but filter for counts that are > 0 so that there is something to decrement).
- If an attempt is made to increment/decrement a GameServer that does have the specified counter (e.g. through an allocation), the operation is ignored.
Lists
- Can set a capacity. Defaults to 1000. dsa
- Capacity can be no longer than 1000 items. This could possibly be expanded in the future depending on use cases and/or performance.
- SDK capability to atomically add, remove and check if values are in a list’s local value, which is then set to the backing
GameServerCRD status.- Note: There are race conditions we can’t avoid between SDK updates and Allocation or external updates.
- SDK capability to change the capacity (local and backing CRD status value)
- The ability to atomically add items to list on allocation
- Attempts to add to a list that is at capacity, will silently fail, since all operations are asynchronous. If you need to ensure there is space for an append operation, check with filters and/or the SDK first.
- If an attempt is made to append to a GameServer that does have the specified list (e.g. through an allocation), the operation is ignored.
- The ability to change the capacity from an allocation
- Lists are essentially sorted Set in the order of insertion, i.e. a List cannot contain more than one instance of a value. An attempt to insert a duplicate item into a List will result in a no-op.
Allocation filtering and sorting
- Allocation filter on count value (min, max)
- Allocation filter on count available capacity (min, max)
- Allocation filter on list available capacity (min, max)
- Allocation filter on if single value is contained in a list
- Allocation sorting / preference by a count value/list length, ascending or descending.
- Packed: Within the node.
- Distributed: Across the entire set.
Fleets scheduling
- Fleet scale down sorting by a count value/list length, ascending or descending.
- Packed: Within the node.
- Distributed: Across the entire set.
- Fleet scale down sorting
- Packed: Within the node.
- Distributed: Across the entire set.
Metrics
- Expose count values and list lengths as gauge metric, with a label for the key the count or list is set under.
- Expose counts and list capacities as gauge metric, with a label for the key the count or list is set under.
Background
There have been a lot of discussions and issues about weighted allocation, being able to store “session room” counts to be used on allocation, and more (more on Slack as well), sorting on Fleet scale down.
We’ve also always had a desire to be able to set some level of metrics through Agones from a GameServer as well.
- Weighted allocation scheduling · Issue #2114 · googleforgames/agones
- Add additional sorting criteria during Game Server Set scale down to determine which servers to terminate · Issue #2372 · googleforgames/agones · GitHub
- https://agones.dev/site/docs/reference/agones_crd_api_reference/#agones.dev/v1.PlayerStatus
- https://agones.dev/site/docs/guides/player-tracking/
- https://agones.dev/site/docs/integration-patterns/high-density-gameservers/
- https://agones.dev/site/docs/integration-patterns/player-capacity/
- Proposal: RefCount allocation to host many room in same pod · Issue #1197 · googleforgames/agones · GitHub
- Pass arbitrary metrics from gameserver -> opencensus · Issue #1037 · googleforgames/agones · GitHub
- PlayerTracking: Be able to set GameServer capacity on Allocation · Issue #2670 · googleforgames/agones · GitHub
- Metrics: Players #1035 - googleforgames/agones · GitHub
- Autoscaling Fleet base on Player Count · Issue #1034 · googleforgames/agones · GitHub
Design ideas
Configuration
GameServers
Being able to set arbitrary counts and lists on a GameServer instance.
apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
generateName: "simple-game-server-"
spec:
ports:
- name: default
portPolicy: Dynamic
containerPort: 7654
template:
spec:
containers:
- name: simple-game-server
image: gcr.io/agones-images/simple-game-server:0.13
counters: # list of counters. Key value below is the key for each counter.
rooms: # key for the counter (room)
default: 1 # initial value
capacity: 100 # maximum possible count value
lists: # list of lists.
players: # key for this list (players)
capacity: 100 # maximum number of items in a list
frogs: # key for another list (frogs), with the default 1000 item capacity
GameServer Status
This is where current count and list value and capacity are stored against the CRD. The values in the spec do not change once they have been initially declared.
status:
# .. usual status values
counters: # count values
rooms: 4 # Current count for "room" key
capacity: 100 # maximum value for "room" key
lists: # list values
players: # values for key "players"
capacity: 100 # the current capacity as it has been set.
values: # list of values set against this list
- xe9m
- 9iuz
frogs: # values for key "frogs"
values:
- blue
- green
- orange
Fleets
apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
name: simple-game-server
spec:
replicas: 2
priorities: # which gameservers in the Fleet are most important to keep around - impacts scale down logic
- type: count # whether a count or a list. List uses the length as the value, count the current count value.
key: room # The key to grab data from. If not found on the GameServer, those GameServer with the key will have priority over those that do not.
order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
template:
spec:
ports:
- name: default
containerPort: 7654
template:
spec:
containers:
- name: simple-game-server
image: gcr.io/agones-images/simple-game-server:0.13
counters: # list of counters. Key value below is the key for each counter.
rooms: # key for counter (room)
default: 1
capacity: 100
lists: # list of lists.
players: # key for this list (players)
capacity: 100 # set capacity
frogs: # key for another list (frogs), with the default 1000 item capacity
Status
status:
# ... usual fleet status values
counters: # aggregate counter values
rooms:
total: 43 # total of count values for key "rooms"
capacity: 100 # total capacity count in all GameServers across the fleet "rooms" key
lists: # aggregate list values
players:
count: 58 # total number of list items in all GameServers across the Fleet under "player" key
capacity: 200 # total capacity count in all GameServers across the Fleet "player" key
frogs:
count: 12
capacity: 88
FleetAutoscaling
Count based autoscaling
apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
name: fleet-autoscaler-count
spec:
fleetName: fleet-example
policy:
type: Count # count based autoscaling
count:
# The key for the count value.
key: rooms
# Size of a buffer of counted items that are available in the Fleet.
# it can be specified either in absolute (i.e. 5) or percentage format (i.e. 5%)
bufferCount: 5
# minimum aggregate count capacity that can be provided by this FleetAutoscaler.
# if not specified, the actual minimum capacity will be bufferCount
minCount: 10
# maximum aggregate count capacity that can be provided by this FleetAutoscaler.
# required
maxCount: 100
List based autoscaling
apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
name: fleet-autoscaler-list
spec:
fleetName: fleet-example
policy:
type: List # List based autoscaling.
count:
# The key for the count value.
key: players
# Size of a buffer based on the list capacity that is available over the current aggregate list length in the Fleet.
# It can be specified either in absolute (i.e. 5) or percentage format (i.e. 5%)
bufferLength: 5
# minimum aggregate list capacity that can be provided by this FleetAutoscaler.
# if not specified, the actual minimum capacity will be bufferLength
minLength: 10
# maximum aggregate list capacity that can be provided by this FleetAutoscaler.
# required
maxLength: 100
Allocations
kind: GameServerAllocation
spec:
# Which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
# First item on the array of priorities is the most important for sorting.
priorities:
- type: count # whether a count or a list. List uses the length as the value, count the current count value.
key: room # The key to grab data from. If not found on the GameServer, has no impact.
order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
selectors:
- matchLabels:
agones.dev/fleet: simple-game-server
counters: # filter on counter min and max values
rooms: # use "room" key values
min: 4 # filters on count values (optional, defaults to 0)
max: 20 # (optional, defaults to max int)
minAvailable: 0 # filters on the capacity left on a GameServer (optional, defaults to 0)
maxAvailable: 99 # (optional, defaults to max int)
lists: # filter on lists
players:
minAvailable: 0 # filters on the capacity left on a GameServer
maxAvailable: 99
frogs:
contains: orange # filter on if this value is found in the list.
counters: # apply an action to a counter
rooms:
action: increment # "increment" or "decrement" a count.
amount: 1 # how much by. defaults to 1.
lists: # apply an action to a list.
players:
append: # (optional) append these values to the list
- x7un
- 8inz
capacity: 40 # (optional) change the capacity of the GameServer to this value.
SDK
The SDK will batch operations every 1 second for performance reasons, but changes made through the SDK will be atomically accurate through the SDK. Changes made through Allocation or the Kubernetes API will be eventually consistent when coming back to the SDK.
Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.
Counter
All functions will error if the key was not predefined in the GameServer resource on creation.
Alpha().CountGet(key): integer
Returns the current count under the provided key.
Alpha().CountIncrement(key, amount): boolean
Increment a counter by a given amount. Will max at max(int64).
Will execute the increment operation against the current CRD value.
Returns false if the count is at the current capacity (to the latest knowledge of the SDK), and no increment will occur.
Note: A potential race condition here is that if count values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value incremented past the capacity will get silently truncated.
Alpha().CountDecrement(key, amount): boolean
Decrements the current count by the provided amount. Will not go below 0.
Will execute the decrement operation against the current CRD value.
Returns false if the count is at 0 (to the latest knowledge of the SDK), and no decrement will occur.
Alpha().CountSet(key, amount)
Sets a count at a given value. Use with care, as this will overwrite any previous invocations’ value.
Alpha().CountSetCapacity(key, capacity)
Update the capacity for a given count. A capacity of 0 is no capacity.
Alpha().CountGetCapacity(key): integer
Get the current capacity for this specific count.
Lists
All functions will error if the key was not predefined in the GameServer resource on creation.
Alpha().ListAppend(key, value): boolean
Appends the provided value to the list. If the list is already at capacity, it will return an error.
Will retrieve the current CRD value before executing the append operation.
Returns false, if the value already exists in the list, or if the list is already at capacity (to the latest knowledge of the SDK).
Note: A potential race condition here is that of list values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value appended past the capacity will get silently truncated.
Alpha().ListDelete(key, value): boolean
Delete the specified value from the list.
Returns false if the value is not found in the list (to the latest knowledge of the SDK),
Alpha().ListSetCapacity(key, capacity)
Update the capacity for a given list. Capacity must be between 1 and 1000.
Alpha().ListGetCapacity(key): integer
Get the current capacity for this specific list.
Alpha().ListContains(key, value): boolean
Returns true if the given list contains a provided value.
Alpha().ListLength(key, value): integer
Returns the current length of the given list.
Alpha().ListGet(key): []string
Returns the contents of the given list.
Metrics
Metrics should be exported, using the key that the metric is stored under as a label on the metrics, in aggregate across all GameServers, giving us the ability to export basic numeric values as gauge metrics.
The Fleet name as a label attached to each metric.
Counters
Total of all counters on all GameServers, by key
agones_gameservers_counter_total[key=${key}]
Total count capacity of all GameServers, by key
agones_gameservers_counter_capacity_total[key=${key}]
Lists
Total number of items in each list, by key of all GameServers
agones_gameservers_list_length_total[key=${key}]
Total list capacity of all GameServers, by key
agones_gameservers_list_capacity_total[key=${key}]
Dashboards
Since we are using labels, we can create some generic dashboards with dropdowns for each fleet, and names for counts and lists.
Critical User Journeys
Some high level summaries for some user journeys that could be utilised with this new functionality.
Player Tracking
Player tracking could be implemented in essentially the same way that is possible now, but we could also take an approach that could reserve player connections at allocation time.
An end user could now add a player at allocation time to the GameServer, blocking that space for the player. A gameserver binary could watch for that addition, then wait a determined amount of time before removing it from a “players” list if that player has not yet connected.
For example:
kind: GameServerAllocation
spec:
selectors:
- matchLabels:
agones.dev/fleet: simple-game-server
lists: # filter on lists
players:
minAvailable: 0 # filters on the capacity left on a GameServer
maxAvailable: 99
gameServerState: Allocated
- matchLabels:
agones.dev/fleet: simple-game-server
gameServerState: Ready
lists: # apply an action to a list.
players:
append: # (optional) append these values to the list
- x7un
Room based High Density Game Servers
This could now be handled as an integer value as a count, or as a list with individual room ids.
A count based Allocation could look something like:
kind: GameServerAllocation
spec:
priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
- type: count # whether a count or a list. List uses the length as the value, count the current count value.
key: room # The key to grab data from. If not found on the GameServer, has no impact.
order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
selectors:
- matchLabels:
agones.dev/fleet: simple-game-server
counters: # filter on counter min and max values
rooms: # one room available, against capacity
minAvailable: 1
maxAvailable: 1
gameServerState: Allocated
- matchLabels:
agones.dev/fleet: simple-game-server
gameServerState: Ready
counters: # apply an action to a counter
rooms:
action: increment # "increment" or "decrement" a count.
amount: 1 # how much by. defaults to 1.
This would prioritise allocation to server that have more rooms currently running, and increment the value of the room count at allocation time, which could be picked up on by SDK.WatchGameServer()
A list based Allocation could look something like:
kind: GameServerAllocation
spec:
priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
- type: list # whether a count or a list. List uses the length as the value, count the current count value.
key: room # The key to grab data from. If not found on the GameServer, has no impact.
order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
selectors:
- matchLabels:
agones.dev/fleet: simple-game-server
lists: # filter on lists
rooms:
minAvailable: 1 # 1 room available, please
maxAvailable: 1
gameServerState: Allocated
- matchLabels:
agones.dev/fleet: simple-game-server
gameServerState: Ready
lists: # apply an action to a list.
rooms:
append: # (optional) append these values to the list
- x7un
If you then wanted to allocate to a the GameServer with the specific Room session, you could do the following:
kind: GameServerAllocation
spec:
selectors:
- matchLabels:
agones.dev/fleet: simple-game-server
lists: # filter on lists
rooms:
contains: x7un # filter on if this value is found in the list.
gameServerState: Allocated
- matchLabels:
agones.dev/fleet: simple-game-server
gameServerState: Ready
Note: An end user could still use the “label locking” method for high density game servers as well / still. This just provides another way to solve the same problem that may be more applicable for some use cases.
Game Specific Weight allocation
With this new functionality, if you wanted to prioritise Allocation based on how many blueberries were available in your game server (or any arbitrary thing) , you could now do this as well. I’ve had conversations with people on how to preferentially “Allocate to the most interesting GameServer” - this would allow you to do exactly that, through an arbitrary counter tracking at the GameServer level.
For example:
kind: GameServerAllocation
spec:
priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
- type: count # whether a count or a list. List uses the length as the value, count the current count value.
key: blueberries # The key to grab data from. If not found on the GameServer, has no impact.
order: ascending # more blueberries, is better
selectors:
- matchLabels:
agones.dev/fleet: simple-game-server
gameServerState: Allocated
- matchLabels:
agones.dev/fleet: simple-game-server
gameServerState: Ready
The blueberries key would then be incremental and decremental with Alpha().CountIncrement(key, amount) and Alpha().CountDecrement(key, amount) as necessary from within the game server binary as needed.
Alternatives considered
We could continue having specific integrations for each specific use case -- much like we did for player tracking. Personally, this is what often dissuaded me from adding more specific solutions to specific problems in many of the tickets above -- their specificity. i.e. “This solution works for this specific problem”. I personally prefer more generic solutions that can power a wide multitude of solutions. I genuinely believe that Agones’ power comes from its configurability and flexibility. That tradeoff does come with a higher cost for integration and greater overall complexity of the stack, but I don’t think the project would be as successful as it is without that flexibility.
I think the difference in player tracking was that it felt generic “enough” across use cases that it made sense. But I think this new approach is even more generic in its approach, and allows for a much wider set of use cases (probably ones we haven’t thought of yet), without need to build out yet another CRD and SDK implementation, and without sacrificing capability (in fact I think it adds capability). Which is also why I’m quite excited about it.
Work Items
List of individual work items on this design, so it doesn't seem so overwhelming 😃
API Surfaces
This is not implementation, this is creating placeholders for data, CRD structures, proto API definitions, and stubs for SDK methods.
- [x] Feature Flag creation
- [x] CRD Updates
- [x] GameServer CRD updates
- [x] GameServerSet CRD updates
- [x] Fleet CRD updates
- [x] FleetAutoscaling CRD updates
- [x] GameServerAllocation CRD Updates
- [x] .proto updates
- [x] Allocation .proto updates
- [x] Alpha SDK .proto updates and stub methods on SdkServer
Implementation
Building functionality on top of the API surfaces that have been built out above.
- [x] Defaults
- [x] Defaults for counts on GameServerSpec
- [x] Defaults for lists on GameServerSpec
- [x] Population of GameServer -> Status on creation
- [x] Validation
- [x] Validation or counts on GameServerSpec
- [x] Validation for lists on GameServerSpec
- [x] Fleets
- [x] Fleet status aggregate values (also with GameServerSet)
- [x] Fleet scale down prioritisation
- [x] Autoscaling
- [x] FleetAutoscaling based on a count
- [x] FleetAutoscalong based on a list
- [x] GameServerAllocation
- [x] Conversion from .proto allocation to a GameServerAllocation
- [x] GameServer selection prioritisation
- [x] Allocation filtering on counts
- [x] Allocation filteirng on lists
- [x] Allocation actions on counts (increment / decrement)
- [x] Allocation actions on lists (append)
- [x] Allocation change capacity on counts
- [x] Allocation change capacity on lists
- [x] SDK Implementation
- [x] Write Go SDK stubs for Count functions
- [x] Write Go SDK stubs for List functions
- [x] Update Go simpple-game-server to have commands for Count and List SDK methods
- [x] Implement Count functions in SDKServer, and write e2e tests
- [x] Implement List functions in SDKServer, and write e2e tests
- [x] Write SDK conformance tests for Go SDK
- [x] Metrics
- [x] Expose metrics
- [x] (Optional) Create a generic dashboard based on the labels we use with our metrics.
- [x] Other language SDKs
- [x] Rust SDK implementation and conformance tests
- [x] C# SDK implementation and conformance tests
- [x] node.js SDK implementation and conformance tests
- [x] REST conformance tests
- [x] CPP implementation and conformance tests
- [x] Unity implementation and conformance tests
- [x] #3651
Calling on people I think might find this interesting, since this is a big idea 😄 : @tenevdev , @highlyunavailable , @neuecc , @castaneai , @sisso , @issotina , @foxydevloper
This is on the agenda for the community meeting tomorrow so if you have opinions / want to discuss with real time feedback we would love to see you there.
I just realised, I didn't add a section on Fleet Autoscaling! I'll amend that shortly.
We are looking to implement Room based High Density Game Servers and would like to provide feedback related to this.
About Counters
Question: What if you attempt to decrement below 0? Should it silently fail at 0, or should it filter out GameServers at 0?
If it attempts to decrement the counter below 0, we would like it to filter out GameServers. In particular, in our case, we may want it to be filtered based on whether it has capacity or not. In other words, if it is smaller than the min or larger than the max of the counter, we would like the game server to be filtered. This is the same idea as the lists minAvailable and maxAvailable.
Question: What if an allocation attempts to increment a value to a GameServer that doesn’t have the counter? Should it automatically Should it automatically filter to GameServers that have a counter with the provided key?
We would like to filter GameServers that have a counter with the provided key. However, it is also strange to filter by something that is not in the selectors, so if we want to add a counters field, we may want to validate it so that the same field is also included in the selectors.
About Lists
Question: What if you attempt to add a list that is at capacity? Does it silently fail, does it filter out any GameServers that don’t have room? Something else?
We would like it to filter out any GameServers that don’t have room. In our case, we would like to be able to filter out game servers based on whether or not they are at capacity at the time of allocation, so that we can manage room capacity for the same game server with a high degree of accuracy.
Question: What if an allocation attempts to append a value to a GameServer that doesn’t have the list? Should it automatically filter to Should it automatically filter to GameServers that have a list with the provided key?
Same as counters, we want it to filter by the GameServer that has the key.
About SDK
Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.
I didn’t understand this question, could you please elaborate?
About Critical User Journeys
In the Room based High Density Game Servers example, the StateAllocationFilter is not used. I think this implies that if we use counters or lists as the selector, the search will also include GameServers in the Allocated state. However, since the advantage of this feature is its high flexibility, I felt that it would be better not to infer the state of GameServer just because counters and lists are used, so that the flexibility would not be sacrificed.
Just dropped several edits to remove some questions based on the above and internal feedback. PTAL.
Summary:
- General:
- Make note that this isn't a metrics service, but the usage of metrics is a byproduct of the functionality.
- Little clarification tweaks across the document.
- Counters
- Made a decision on what to do with counters that overflow below 0, or greater than max(in64)
- Also made a decision that the user must be explicit on filtering for room to increment or decrement a counter
- Allocations
- Fixed that I totally missed
gameServerStatein the allocations. Reworked the examples.
- Fixed that I totally missed
To see a diff, use the edit history button:

@katsew to respond to your questions directly:
If it attempts to decrement the counter below 0, we would like it to filter out GameServers. We would like to filter GameServers that have a counter with the provided key.
As per above, you would need to explicitly tell the allocation with min and/or max counter filter values on the allocation `selectors.
However, it is also strange to filter by something that is not in the selectors, so if we want to add a counters field, we may want to validate it so that the same field is also included in the selectors.
I'm thinking that there might be an implicit filtering there (i.e. if you attempt to allocate on counter "foo" and it doesn't exist on the GameServer, the system would attempt to increment, fail, and then move on to another GameServer (or it may be smart enough to pre-check). This seems reasonable. If an allocation can't perform a list/counter action on a GameServer, then the GameServer can't be moved to Allocated - hence it would get skipped. I'll write some words to this effect.
We would like it to filter out any GameServers that don’t have room.
You can choose if you want to use a list with a capacity, or just have a counter that lists how many rooms are left. It's up to you.
Same as counters, we want it to filter by the GameServer that has the key.
In that case, a list of room id tokens with a capacity seems like the appropriate choice for your use case, since you can filter on a room id within allocated game servers with this functionality.
Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.
Ah - this is an interesting point if you aren't aware of how the internal of K8s works.
Essentially everything in k8s is eventually consistent, and therefore so is Agones. It allows the entire system to be self healing even if the control plane goes down for a time.
So SDK commands are async (they go into a queue once the SDK command has been fired), and at the same time with this functionality, it's entirely possible for an Allocation or a K8s API command to change a list or counter value at the same time - so it's entirely likely that if people are doing both the count/list values in an SDK will be out of sync with what's in a CRD, and vice versa - because, eventually consistent.
For Player Tracking, we told people "pick one path, so you don't have this issue". Here we are giving people lots of different options, and we'll need to be very explicit about what each of the tradeoffs are so that unexpected issues don't arise for end users.
Did that make a certain amount of sense?
In the Room based High Density Game Servers example, the StateAllocationFilter is not used.
OMG. I totally missed that I didn't add those.
That functionality definitely would still work, I just wasn't thinking and forgot to add it 🤦🏻 thanks for the excellent catch!
Please let me know if any of that didn't make any sense.
For Player Tracking, we told people "pick one path, so you don't have this issue". Here we are giving people lots of different options, and we'll need to be very explicit about what each of the tradeoffs are so that unexpected issues don't arise for end users. Did that make a certain amount of sense?
Yes, totally made sense, thank you. This is what I really must care about within the implementation, so It's nice to have notes about it 😌
Was chatting with @roberthbailey , and he raised an interesting point.
In Player Tracking, Lists where essentially treated as Sets (i.e. every value was unique in the List). If you add a playerId that had previously been added to the set, it was treated as a no-op.
Do we do that here are well with Lists (maybe rename them to Sets?), or since we're aiming for a more generic implementation, do we allow duplicate values in a List?
🤔 or do we need add a setting to a List, something like unique: true to basically turn it into a set, and therefore it only maintains unique values.
What do people think?
I hadn't thought about the fact that we treated lists of players as having to be unique, but I guess we didn't expect to have the same player join a game session twice.
If we are making the lists more generic, we should think whether there are scenarios where having duplicate values makes sense. Also, what do duplicate values means in terms of allocation requests? Presumably you would always end up checking for at least one occurrence of the string in the list.
@markmandel Hi, thanks for the mentions. This is an interesting proposal! I have a couple of suggestions and concerns.
- the status code of the Allocator should be subdivided due to the complexity of the conditions of the allocation filter. It would be even better if the reason for failure is also exposed.
- As @katsew said, the issue of eventual consistency is very complex and difficult. The documentation needs to address this issue in detail. If possible, it might be a good idea to publish a demo project as an example implementation. Also, game developers are generally not familiar with the internals of k8s. Therefore, rather than using etcd on k8s, they may choose to implement their own game server management using an external RDB with strong consistency. We would like to make it clear in the documentation that such other choices are also available.
I am very happy to see Agones add more features to address even more use cases. Thank you!
Therefore, rather than using etcd on k8s, they may choose to implement their own game server management using an external RDB with strong consistency.
I'm not sure if it should be in scope for this proposal, but it would be really interesting to see what that looks like - which parts are owned by Agones and which parts are split out. It might help us design a better solution that more seamlessly integrates with a solution that leverages an external RDB.
Was chatting with @roberthbailey , and he raised an interesting point.
In Player Tracking, Lists where essentially treated as Sets (i.e. every value was unique in the List). If you add a
playerIdthat had previously been added to the set, it was treated as a no-op.Do we do that here are well with Lists (maybe rename them to Sets?), or since we're aiming for a more generic implementation, do we allow duplicate values in a List?
🤔 or do we need add a setting to a List, something like
unique: trueto basically turn it into a set, and therefore it only maintains unique values.What do people think?
I think we should go with lists (allow duplicate values) to cover more scenarios than with sets. However, our case needs to maintain unique values like Player Tracking, so it's nice to have an option to do so. Or, maybe publishing an example implementation to deduplicate items in the list is enough.
Related to this, I'm wondering what is the expected behavior of Alpha().ListDelete(key, value).
Is this delete all value for the key or delete the first value found in the list?
Related to this, I'm wondering what is the expected behavior of Alpha().ListDelete(key, value). Is this delete all value for the key or delete the first value found in the list?
Oooh, that's a good question also. I think for lists, it would have to be a single value.
We could implement a Alpha().ListDelete(key, value, [one|all]) kind of options in the SDK (defaulting to "one", depending on language etc). May not need to do this for the initial release - may be better to wait and see how people want to use it? 🤔
We could implement a Alpha().ListDelete(key, value, [one|all]) kind of options in the SDK (defaulting to "one", depending on language etc). May not need to do this for the initial release - may be better to wait and see how people want to use it? 🤔
In our case, we don't want to duplicate values in the list, so it would be helpful for us to have a document how to deduplicate values in the list at the initial release. More to say, to collect feedbacks from potential users, it's nice to have a document how to migrate from Player Tracking or an example for Player Tracking users since Player Tracking doesn't expect a duplicated value.
More to say, to collect feedbacks from potential users, it's nice to have a document how to migrate from Player Tracking or an example for Player Tracking users since Player Tracking doesn't expect a duplicated value.
Yeah, that 100% makes sense - we need to make sure there is a migration path.
I'm leaning towards:
lists: # list of lists.
players: # key for this list (players)
capacity: 100 # set capacity
unique: true # this makes it work like a set
So have the unique option (maybe defaults to false? Since it's called a "list") so that you can make it work like player tracking.
How does that sound?
The unique option sounds good to me.
I have several thoughts:
- The
uniqueoption should not be editable after creating GameServer, since users know what that list for and no reason to change the option after defining GameServer. - The SDK may return boolean value when users call
Alpha().ListAppend(key, value)andAlpha().ListDelete(key, value)likeAlpha().PlayerConnect(playerID)andAlpha().PlayerDisconnect(playerID)do, to cover the Player Tracking use case? 🤔- I think we need more feedback on this
Thanks for all the great discussion! Sorry I dropped off for a bit, been focusing on another open source project for a bit.
Discussing things in the community meeting - we were discussing only allowing unique items in the List (so basically an ordered set), if we couldn't come up with a use case for having multiple of the same item in a list, to avoid implementing features that we didn't need.
Can anyone come up with a use case? If not, maybe we just drop the ability to store duplicate values in a list (should we rename it to a sortedSet / orderedSet ?
The unique option should not be editable after creating GameServer, since users know what that list for and no reason to change the option after defining GameServer.
If we do go this route, totally agreed (see comment above).
The SDK may return boolean value when users call Alpha().ListAppend(key, value) and Alpha().ListDelete(key, value) like Alpha().PlayerConnect(playerID) and Alpha().PlayerDisconnect(playerID) do, to cover the Player Tracking use case? thinking
We can definitely do this, I had left them off since there was lots of eventual-consistency management , but we can definitely do it with whatever the SDK knows about at that point and time from itself and/or what it's current information is on the CRD. I can't see any huge downside to adding thins, so I'll put this on my list of things to add back in.
Can anyone come up with a use case? If not, maybe we just drop the ability to store duplicate values in a list (should we rename it to a
sortedSet/orderedSet?
I don't have a use case for lists, so dropping the ability to store duplicate values sounds good to me. :)
I think it's better to rename to a sortedSet / orderedSet to fit to actual behavior.
I have a question about data manipulation on allocation.
The ability to atomically increment or decrement counts on allocation The ability to atomically add items to list on allocation
I have a plan to use multiple Allocator Service in a single k8s cluster for redundancy reason. When I do this, are there any race condition with data manipulation on allocation? For instance, having duplicating values in the list (even if we use sortedSet) or increment counter too much?
K8s resource modifications are generationally locked - so unless the local system has the latest generation of a resource, any update is rejected, which avoids "last-update-wins" race conditions.
You could in theory select the same GameServer in succession if after an Allocation it still matches the search criteria for Allocated GameServers - but that's an exercise for the developer to find the appropriate level of locking for their game.
Sorry for the long delay - was focusing on working towards a Quilkin release, and the addition of capacity to counts was tricky.
But we have updates! Would love your feedback!
Summary of updates to the design above:
Working through changes for autoscaling implemented a few changes to the design:
Summary of changes:
- Counters can now have a capacity (maximum), this allows for autoscaling on counters, because we can calculate the difference between the capacity and the current amount, like we do for lists.
- Lists can only take a maximum of 1000 items to prevent overuse.
- Added note that lists can only contain unique value (insertion ordered sets)
- Determined that if you go over capacity, it’s essentially a silent no-op. If you need to ensure capacity is in place / the key exists, you need to explicitly filter for it.
- Added sections on FleetAutoscaling for counts and list
- Added back in return values from the SDK, based on the SDKs current knowledge of the GameServer counts and lists.
- Updates to CUJs.
Questions:
- We’ve used “capacity” across both lists and counters for consistency. Is this easy to understand?
Thanks for updates!
We’ve used “capacity” across both lists and counters for consistency. Is this easy to understand?
Using "capacity" across lists and counts sounds good to me :)
Added sections on FleetAutoscaling for counts and list
I'm not sure but should we support multiple counts and lists for GameServer? 🤔 Because FleetAutoscaler does not support multiple counters and lists for autoscaling, it sounds like we don't need to support multiple counts and lists in GameServer Resource. What do you think?
I'm not sure but should we support multiple counts and lists for GameServer? thinking
That's an interesting question.
My thought was, while it may not be used for autoscaling, it may be used for allocation filtering.
It's a bit clunky, but say you want to track Rooms and players per room - you might have something like:
apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
generateName: "simple-game-server-"
spec:
ports:
- name: default
portPolicy: Dynamic
containerPort: 7654
template:
spec:
containers:
- name: simple-game-server
image: gcr.io/agones-images/simple-game-server:0.13
counters:
rooms:
default: 0
capacity: 4
lists:
players_1:
players_2:
players_3:
players_4:
That probably doesn't scale if you have 1000 rooms (at which point, go use a DB), but this works in a pinch.
So you would autoscale on rooms, but you may filter on the player lists. (or actually, one big player list across all rooms would probably be better for this scenario. But you get the drift).
WDYT?
Ah, that's true. It's possible that the metrics used for GameServer Allocation and the metrics used for AutoScale are different. Makes sense to me now. Thank you.
Thanks for the feedback and working through it with me!
Just broke this down into lots of smaller tasks, so it can start to be picked apart. PTAL and point out if I'm missing a step anywhere.
@igooch - can we tick a few of the boxes off in the checklist? I believe several are complete already (I can guess at the PR's above, but don't want to mess up your workflow 😄 )
@igooch - can we tick a few of the boxes off in the checklist? I believe several are complete already (I can guess at the PR's above, but don't want to mess up your workflow 😄 )
Doesn't look like I have the ability to check off the task list (you'd need to edit your comment and place an "x" in the box).
Merged tasks of April 3, 2023 (tasks in parentheses not explicitly tracked in task list):
-
Feature Flag creation
-
GameServer CRD updates
-
GameServerSet CRD updates
-
Fleet CRD updates
-
(Includes updated example gameserver.yaml)
-
Alpha SDK .proto updates and stub methods on SdkServer
-
(Implemented the LocalSDKServer methods and LocalSDKServer unit tests.)
-
(Includes code generated from the alpha.proto for the Golang SDK, nothing implemented client side yet.)
-
Defaults for counts on GameServerSpec
-
Defaults for lists on GameServerSpec
-
Population of GameServer -> Status on creation
-
(Includes Defaults unit tests)
-
Validation for counts on GameServerSpec
-
Validation for lists on GameServerSpec
-
(Includes Validation unit tests)
Doesn't look like I have the ability to check off the task list (you'd need to edit your comment and place an "x" in the box).
Well that's rude! 🤦🏻 I assigned the ticket to you, so maybe now you can.
Anyway, I ticked boxes! Lemme know if I got anything wrong.
Just a quick change update - moved metrics to using labels - far easier and we can build generic dashboards.
Hello! I really like where this is heading and we definitely have use cases that we will implement over that. The current issue doesn't talk about custom autoscaler through webhook though. Is this planned?