agones
agones copied to clipboard
Player Tracking for each GameServer
Is your feature request related to a problem? Please describe.
It would be very useful to track player connection and disconnection against a specific GameServer
.
This would allow us to do several things:
- Track if a GameServer is full or not (we should also track GameServer capacity as well)
- Be able to trace and debug a player lifecycle from matchmaker to gameserver connection (we should track an
id
ortoken
on connection) - Autoscale Fleets based on how full gameservers are (useful for persistent worlds)
Describe the solution you'd like (Full design TBD)
- Need a way to specify the capacity of a GameServer, at creation time, but also editable from the SDK at runtime
- SDK methods to track a player connection, and player disconnection with an id/player token
-
GameServer
CRD Events for player connections and disconnections -
GameServer
CRD Status values that track player counts and capacity. Maybe should also have a list of current connected players? - May want to add some labels to allow for searching for full/empty/partially full game servers through the k8s api
Describe alternatives you've considered Have a separate system for player tracking - but based on feedback, this is a feature that almost all users should find useful, so it feels like part of Agones, and also allows some useful functionality down the road -- such as autoscaling by player count, or automating backfill operations by searching for non-full game servers.
Additional context I do have concerns that we are adding more API QPS with this, so we should track performance with these changes.
I also do have a concern for etcd, it could potentially be a lot depending on the design.
A Kubernetes event each time a player join ?
Yeah agreed - was chatting with some people the other day, talking through this,
The basic idea we ended up thinking, which I think would work better long term (there are some implementation design decisions to work out too) are:
- Only keep the player count and capacity on the GameServer CRD status value.
- The SDK could be smart, and only sync the player count once every second (or configurable interval?), to avoid overloading the system when huge numbers come in. Backfill based on player counts will have race conditions anyway, so games will need to deal with this regardless.
- The an event might just be the update to the player count, not each player, on the same interval as above.
To track players - have a CRD configurable webhook that has player data sent to it on connection / disconnection. Something like the gameserver name, connect/disconnect and the token that is passed through. Then we can build a separate system(s) for storing and tracking connections, and/or respond to events as needed as well.
In theory the sidecar would need to send the webhook request (I think),, but that's doable (and an implementation detail).
This requires a full write up, but I think it sounds like a reasonable approach at first pass. WDYT?
Design ideas
The following is implementation design for both the configuration, status and game server SDK.
Feedback is much appreciated.
Configuration & Status
apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
generateName: "simple-udp-"
spec:
# new configuration
alpha:
players:
initialCapacity: 10 # sets the initial player capacity. Defaults to 0.
webhook: # http(s) webhook to send player dis/connection events
service:
name: player-tracking-service
namespace: default
path: player
ports:
- name: default
portPolicy: Dynamic
containerPort: 7654
template:
spec:
containers:
- name: simple-udp
image: gcr.io/agones-images/udp-server:0.17
status:
alpha:
# tracks players
players:
count: 6 # current number of players. Only updated one per second, so eventually consistent.
capacity: 10 # current capacity. Set by "initialCapacity" and can be changed by the SDK
-
spec.alpha.players.initialCapacity
- the initial player capacity as listed in the status valuestatus.players.capacity
-
spec.alpha.players.webhook
- a webhook configuration that is called whenever a player connects or disconnects. This sort of data seems ill-suited to CRDs, so Agones never stores these values, it’s the responsibility of another system to track this information as required.
The webhook POSTs a JSON payload with the following details:- GameServer name
- PlayerID (provided through the SDK)
- Event: “connect” or “disconnect”
-
status.alpha.players.count
- The current number of players in the system. This information will be eventually consistent as it is only updated once per second to avoid overloading the system. Since player count race conditions are inevitable, there should always be extra checking of player counts on player connection to a game server. -
status.alpha.players.capacity
- The current player capacity of this gameserver. Initially set by theinitialCapacity
value, but also able to be changed at runtime by the Game Server SDK.
SDK Functionality
SDK.Alpha.PlayerConnect(playerID)
This increases the SDK’s stored player count by one, and passes the playerID to the system such that it can fire the webhook with a connection event to any configured receivers.
status.alpha.players.count
is then set to update to the current player count a second from now, unless there is already an update pending.
Will throw an error if the player count is equal to or greater than the capacity.
SDK.Alpha.PlayerDisconnect(playerID) : bool
Decreases the SDK’s stored player count by one, and passes the playerID to the system such that it can fire the webhook with a disconnection event to any configured receivers.
status.alpha.players.count
is then set to update to the current player count a second from now, unless there is already an update pending.
Does nothing, and returns false if the playerID was not previously added. Returns true otherwise.
SDK.Alpha.SetPlayerCapacity(int capacity)
Update the status.alpha.capacity
value with a new capacity.
SDK.Alpha.GetPlayerCapacity() : int
Retrieves the current capacity. This is always accurate, even if the value hasn’t been updated to the GameServer status yet.
SDK.Alpha.GetPlayerCount() : int
Retrieves the current player count. This is always accurate, even if the value hasn’t been updated to the GameServer status yet.
I have some questions regarding PlayerConnect
:
SDK.Alpha.PlayerConnect(playerID) This increases the SDK’s stored player count by one, and passes the playerID to the system such that it can fire the webhook with a connection event to any configured receivers.
Should we wait for webhook response that this player is able to connect? Who would be judging the player if he is able or not able to connect to the session?
Would playerID be stored on GS SDK side or separately?
Then we can build a separate system(s) for storing and tracking connections, and/or respond to events as needed as well.
Hi Mark -
Could this metric be utilized for autoscaling? I'm wondering for the relay server use case, in which multiple relay server processes would run inside a single GameServer container and we need a test for "fullness" for autoscaling up and down.
--Rob
On Wed, Jan 22, 2020 at 9:48 AM Alexander Apalikov [email protected] wrote:
I have some questions regarding PlayerConnect:
SDK.Alpha.PlayerConnect(playerID) This increases the SDK’s stored player count by one, and passes the playerID to the system such that it can fire the webhook with a connection event to any configured receivers.
Should we wait for webhook response that this player is able to connect? Who would be judging the player if he is able or not able to connect to the session?
Would playerID be stored on GS SDK side or separately?
Then we can build a separate system(s) for storing and tracking connections, and/or respond to events as needed as well.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/googleforgames/agones/issues/1033?email_source=notifications&email_token=AAQ3CQNVZWLHOLJX6CIFJC3Q7BMCFA5CNFSM4ITUPHOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJTZ5HI#issuecomment-577216157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3CQJ6KFG5VBITVL5CFWLQ7BMCFANCNFSM4ITUPHOA .
--
-
• *Robert Martin
-
• *Chief Architect *• *Google Cloud for Games • [email protected]
• m: 470-449-2926
@aLekSer good questions!
Should we wait for webhook response that this player is able to connect?
Personally, I don't think so. This could potentially make things very slow in the game server binary. I feel like this should be an async operation that happens behind the scenes.
Who would be judging the player if he is able or not able to connect to the session?
This is up to the game server binary to make this decision. So it's game / transport layer specific.
Would playerID be stored on GS SDK side or separately?
In this design, we don't store the playerID at all (except transiently in memory as it gets passed to the system that sends the webhook request). playerID management is up to the game author to track and manage. This is the main impetus for the webhook, to make this easier.
While it could be useful to store playerIDs in a CRD, it seems like a performance concern for etcd. Maybe something to explore down the line if we find we have bandwidth in etcd performance.
@cloudrobx
Could this metric be utilized for autoscaling? I'm wondering for the relay server use case, in which multiple relay server processes would run inside a single GameServer container and we need a test for "fullness" for autoscaling up and down.
Like this? #1034 :smile: would that work for what you were thinking?
Yeah something like that looks pretty good. For placement purposes is there an easy way to pull the most (or least) full game server by player count, other than iterating through all the game servers in the cluster?
On Wed, Jan 22, 2020 at 12:37 PM Mark Mandel [email protected] wrote:
@aLekSer https://github.com/aLekSer good questions!
Should we wait for webhook response that this player is able to connect?
Personally, I don't think so. This could potentially make things very slow in the game server binary. I feel like this should be an async operation that happens behind the scenes.
Who would be judging the player if he is able or not able to connect to the session?
This is up to the game server binary to make this decision. So it's game / transport layer specific.
Would playerID be stored on GS SDK side or separately?
In this design, we don't store the playerID at all (except transiently in memory as it gets passed to the system that sends the webhook request). playerID management is up to the game author to track and manage. This is the main impetus for the webhook, to make this easier.
While it could be useful to store playerIDs in a CRD, it seems like a performance concern for etcd. Maybe something to explore down the line if we find we have bandwidth in etcd performance.
@cloudrobx https://github.com/cloudrobx
Could this metric be utilized for autoscaling? I'm wondering for the relay server use case, in which multiple relay server processes would run inside a single GameServer container and we need a test for "fullness" for autoscaling up and down.
Like this? #1034 https://github.com/googleforgames/agones/issues/1034 😄 would that work for what you were thinking?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/googleforgames/agones/issues/1033?email_source=notifications&email_token=AAQ3CQJF3ULGCG7PHJILVV3Q7B735A5CNFSM4ITUPHOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJUOOIY#issuecomment-577300259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3CQIRGINELI5QQFPG77DQ7B735ANCNFSM4ITUPHOA .
--
-
• *Robert Martin
-
• *Chief Architect
-
• *Google Cloud for Games
• m: 470-449-2926
Yeah something like that looks pretty good. For placement purposes is there an easy way to pull the most (or least) full game server by player count, other than iterating through all the game servers in the cluster?
See: #1239 for current initial thoughts.
For relays / high density would probably also want the ability to find the most empty or most full server as well. Since you're essentially adding another layer of bin-packing.
On Thu, Jan 23, 2020 at 7:59 PM Mark Mandel [email protected] wrote:
Yeah something like that looks pretty good. For placement purposes is there an easy way to pull the most (or least) full game server by player count, other than iterating through all the game servers in the cluster?
See: #1239 https://github.com/googleforgames/agones/issues/1239 for current initial thoughts.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/googleforgames/agones/issues/1033?email_source=notifications&email_token=AAQ3CQPRB6P7554DM3VDJ7DQ7I4ONA5CNFSM4ITUPHOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJZMURI#issuecomment-577948229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3CQOJ6WZGSWJX5AQY3NTQ7I4ONANCNFSM4ITUPHOA .
--
-
• *Robert Martin
-
• *Chief Architect
-
• *Google Cloud for Games
• m: 470-449-2926
For relays / high density would probably also want the ability to find the most empty or most full server as well. Since you're essentially adding another layer of bin-packing.
This sounds like the current "Distributed" vs "Packed" strategies we already have in place. Also probably a topic to discuss more on #1239 rather than here, since this ticket isn't covering allocation - just player count tracking.
Made a small tweak to SDK. Renamed GetCapacity
and SetCapacity
to GetPlayerCapacity
and SetPlayerCapacity
- just to be clearer.
Some design updates that came out of the discussion on https://github.com/googleforgames/agones/pull/1447#discussion_r406637680.
And we came back around to storing the list of player id tokens on the CRD, to avoid any issues with storing player Ids in memory, and things that could go wrong if the Agones sidecar crashed.
Also the extra data stored wouldn't be that large, and we wouldn't take any extra API hits, since we would only update the CRD at the same time we update counts (which are batched).
So the following design changes will occur (I'll update the above design section shortly).
- Create a section on the CRD to stored player ids
- Drop the webhook CRD section, as the a Kubernetes GameServer update would have all the information about a player connecting/disconnecting, since it has both the before and after states (and would be a nice batch operation).
- Implement a
SDK.IsPlayerConnected(token): bool
method
We also discussing if SDK.PlayerConnect()
and/or SDK.PlayerDisconnect()
should return a bool/exception if the player id didn't / didn't exist when expected. I just noticed that we had designed PlayerDisconnect()
to return a bool with a success/failure, so unless anyone has objections, I'll update the design on SDK.PlayerConnect()
to do the same thing. But if you think it should return an exception, please definitely say so.
v2 Design
The following is implementation design for both the configuration, status and game server SDK.
This includes feedback and decisions from several PR's and commentary.
Feedback is much appreciated.
Configuration & Status
apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
generateName: "simple-udp-"
spec:
# new configuration
players:
initialCapacity: 10 # sets the initial player capacity. Defaults to 0.
ports:
- name: default
portPolicy: Dynamic
containerPort: 7654
template:
spec:
containers:
- name: simple-udp
image: gcr.io/agones-images/udp-server:0.17
status:
# tracks players
players:
count: 6 # current number of players. Only updated one per second, so eventually consistent.
capacity: 10 # current capacity. Set by "initialCapacity" and can be changed by the SDK
ids: ["b7xy0", "mn87i", "p9un1"] # list of player ids currently connected
-
spec.players.initialCapacity
- the initial player capacity as listed in the status valuestatus.players.capacity
-
status.players.count
- The current number of players in the system. This information will be eventually consistent as it is only updated once per second to avoid overloading the system. Since player count race conditions are inevitable, there should always be extra checking of player counts on player connection to a game server. -
status.players.capacity
- The current player capacity of this gameserver. Initially set by theinitialCapacity
value, but also able to be changed at runtime by the Game Server SDK. -
status.players.ids
- The set of connected player ids at this given moment.
SDK Functionality
All SDK functions that change the GameServer state will be asynchronous, inline with all other state changing SDK commands. To copy paste from the docs
Calling any of state changing functions mentioned below does not guarantee that GameServer Custom Resource object would actually change its state right after the call ... You can verify the result of this call by waiting for the desired state in a callback to WatchGameServer() function.
SDK.Alpha.PlayerConnect(playerID) : bool
This increases the SDK’s stored player count by one, and appends this playerID to status.players.id
.
status.players.count
and status.players.ids
is then set to update to the player count and id list a second from now, unless there is already an update pending, in which case the update joins that batch operation.
PlayerConnect returns true and adds the playerID to the list of playerIDs if the playerIDs was not already in the list of connected playerIDs.
If the player exists within the list of connected playerIDs, PlayerConnect will return false, and the list of connected playerIDs will be left unchanged.
An error will be returned if the playerID was not already in the list of connected playerIDs but the player capacity for the server has been reached. The playerID will not be added to the list of playerIDs.
SDK.Alpha.PlayerDisconnect(playerID) : bool
Decreases the SDK’s stored player count by one, and removes the playerID from status.players.id
status.players.count
and status.players.ids
is then set to update to the player count and id list a second from now, unless there is already an update pending, in which case the update joins that batch operation.
PlayerDisconnect will return true and remove the supplied playerID from the list of connected playerIDs if the playerID value exists within the list.
If the playerID was not in the list of connected playerIDs, the call will return false, and the connected playerID list will be left unchanged.
SDK.Alpha.SetPlayerCapacity(int capacity)
Update the status.capacity
value with a new capacity.
SDK.Alpha.GetPlayerCapacity() : int
Retrieves the current capacity. This is always accurate, even if the value hasn’t been updated to the GameServer status yet.
SDK.Alpha.GetPlayerCount() : int
Retrieves the current player count. This is always accurate, even if the value hasn’t been updated to the GameServer status yet.
SDK.Alpha.IsPlayerConnected(playerID) : bool
Returns if the playerID is currently connected to the GameServer. This is always accurate, even if the value hasn’t been updated to the GameServer status yet.
SDK.Alpha.GetConnectedPlayers() : []string
Returns the list of the currently connected player ids. This is always accurate, even if the value hasn’t been updated to the GameServer status yet.
I like this ... ony thing that I do find strange is that the SDK.Alpha.PlayerConnect
& SDK.Alpha.PlayerDisconnect
are not idempotent.
Personally for SDK.Alpha.PlayerConnect
would feel nicer to return true if you are connected already or connecting for the first time and false if you are unable to connect.
Similarly with SDK.Alpha.PlayerDisconnect
I feel that it would be nicer to return true
if the player was connected and you disconnect them and false otherwise.
It may also be worth having a GetConnectedPlayers
API call as we expose the data in the CRD:
ids: ["b7xy0", "mn87i", "p9un1"] # list of player ids currently connected
I think we might be saying the same thing, although maybe my words are confusing:
I like this ... ony thing that I do find strange is that the SDK.Alpha.PlayerConnect & SDK.Alpha.PlayerDisconnect are not idempotent.
Both methods should be idempotent - I think we're on the same page there.
Personally for SDK.Alpha.PlayerConnect would feel nicer to return true if you are connected already or connecting for the first time and false if you are unable to connect.
My design would return true on the first call of this method with a unique playerId. I think your wording might be better. It would return false only if the player is already marked as connected (hence the idempotency). The bool value is a indicator of a successful operation.
Similarly with SDK.Alpha.PlayerDisconnect I feel that it would be nicer to return true if the player was connected and you disconnect them and false otherwise.
I think we're saying the same things here too. Is there a better way it could be worded?
It may also be worth having a GetConnectedPlayers API call as we expose the data in the CRD:
I couldn't decide if we should have that, so figured I'd wait and see if someone requested it. Sounds like a good idea, I'll add it to the design now! :+1:
Ah, I think I was using idepotent wrong in thinking we should have the same return value each call but this is in-fact not required. So yes I think we are both on the same page and just a slight update on wording. 👍
Have had an attempt at rewording (mainly for my own understanding) feel free to take bits or ignore. 😀
PlayerConnect
From:
Returns false if the playerID is already marked as connected, true if not.
To:
PlayerConnect returns true and add the playerid to the list of playerids if the playerid was not already marked as connected. If the player was already marked as connected false will be returned, and the list of connected playerids will be left unchanged.
An error will be returned if the playerid was not already in the list but the player capacity for the server has been reached. The playerid will not be added to the list of playerids.
PlayerDisconnect
From:
Returns false if the playerID was not previously marked as connected. Returns true otherwise.
To:
PlayerDisconnect will return true and remove the supplied playerid from the list of playerids. If the playerid was not in the list of playerids the call will return false and the list will be left unchanged.
@domgreen nice! Much more explicit. I tweaked it a little in some places, but I think it reads much better (and is going to end up being our docs anyway).
Please take a look - I think this is much better :+1:
An error will be returned if the playerid was not already in the list but the player capacity for the server has been reached. The playerid will not be added to the list of playerids.
In both connect and disconnect, will we return an error if the sidecar is unable to update the CRD? Or since the CRD updates are batched and async, is that not possible?
If the sidecar can't update the CRD, is there a way to inform the game that there is an issue? Or will it just be silently swallowed by the sidecar and only visible in the sidecar logs?
@markmandel much better :) just typo playerDIs
in Connect.
@roberthbailey why might we not be able to update the CRD?
I think as this would all be async and the source of truth would be in memory (actually probably in the game I think). If for whatever reason an error would occur the gameplay code would need to handling it. The game itself doesnt really want an error if CRD isnt updated (my 2c). I would wait till next update time and try to set it again to the latest value.
In both connect and disconnect, will we return an error if the sidecar is unable to update the CRD? Or since the CRD updates are batched and async, is that not possible?
This is a good point. I will make a note on the design doc -- but this SDK implementation would match the other SDK methods. The functions are async in that they go into our standard workerqueue, and will retry until completed. The only differenec here being that they are enqueue'd one second in the future. So the execution may intermittently fail (say a master goes down for a short period), but will eventually pass on retry (unless something catastrophic occurs).
I think as this would all be async and the source of truth would be in memory.
Exactly. Which is how we handle all GameServer changes from the SDK, and also why we have SDK methods that say that "this will always be accurate, even if the details haven't been stored on the CRD". So SDK.GetGameServer() will watch what is stored in the CRD, whereas SDK.Alpha.GetPlayerCount() (and related) will look at the local, in memory version, which is updated to the CRD on change.
This has been released in Alpha.
A heads up to @devjgm, @Reousa, @steven-supersolid, @mollstam, @dotcom , @drichardson who have been doing a lot of SDK work to include this. If you have cycles to work on adding these to your SDK of expertise, please let us know here.
Hey Mark! Thanks for the heads up, would be great to be able to contribute again! :)
Gave the issue a read, I see these have already been updated to the relevant proto files, so all that's needed is the relevant language implementation, yeah?
If my understanding is correct, consider it done (hopefully in a timely manner)! 👌
That's a good point - you will need to edit the gen.sh scripts such that they generate the code for the alpha.proto, and then yes -- implement the SDK over the top :+1:
Thanks!
Working on Node.js now
I'm going to suggest we close this ticket, since this has been released to alpha, and there is (mostly?) supported in the SDKs - and I think we can then track each of those as seperate tickets as needed.
Also #1677 is going to break all the things (will likely aim to tackle in the next release), so we can track the ongoing work there as well.
Sound good?
Closing as per the last comment (1.5 years ago).