agones icon indicating copy to clipboard operation
agones copied to clipboard

Explore the impact of K8s API Priority and Fairness

Open markmandel opened this issue 3 years ago • 0 comments

Context: https://kubernetes.io/docs/concepts/cluster-administration/flow-control/

This feature is in beta, and enabled from K8s 1.20 onwards.

Summary

Controlling the behavior of the Kubernetes API server in an overload situation is a key task for cluster administrators. The kube-apiserver has some controls available (i.e. the --max-requests-inflight and --max-mutating-requests-inflight command-line flags) to limit the amount of outstanding work that will be accepted, preventing a flood of inbound requests from overloading and potentially crashing the API server, but these flags are not enough to ensure that the most important requests get through in a period of high traffic.

The API Priority and Fairness feature (APF) is an alternative that improves upon aforementioned max-inflight limitations. APF classifies and isolates requests in a more fine-grained way. It also introduces a limited amount of queuing, so that no requests are rejected in cases of very brief bursts. Requests are dispatched from queues using a fair queuing technique so that, for example, a poorly-behaved controller need not starve others (even at the same priority level).

We should explore if/what the impact is on Agones since we do a lot of K8s API requests, since we're so churny on the cluster.

From what I can tell, as operator author, I don't think we have much control. If I'm understanding this correctly - the FlowSchema defines which underlying operations on the cluster get each amount of share - by service account, resource type, type of request, etc, and then maps that back to a PriorityLevelConfiguration which says how much of the total requests (--max-requests-inflight and --max-mutating-requests-inflight) on the cluster are allowed. All of this is set by the cluster admin - I have yet to try, but I don't think we can edit it.

Anyway, we should run some tests, see what we can see. If nothing else, it may produce some documentaiton.

markmandel avatar Jun 28 '22 00:06 markmandel