kopf Slow apiserver with Kopf

Question

Can the kopf operator slow down the kubernetes apiserver? Because if I want to create say 100 CRDs, the apiserver is very slow to handle operator patch requests. Serving a patch will be around 2 seconds.

Log from the apiserver:

I0714 06:27:08.262884       1 trace.go:205] Trace[662526312]: "Patch" url:/apis/l7mp.io/v1/namespaces/default/rules/worker-rtp-rule-32343236-fromtag3234,user-agent:kopf/1.32.1,client:172.17.0.3,accept:*/*,protocol:HTTP/1.1 (14-Jul-2021 06:27:06.063) (total time: 2199ms):
Trace[662526312]: ---"Recorded the audit event" 2183ms (06:27:00.247)
Trace[662526312]: [2.199543224s] [2.199543224s] END

I ask here because I haven’t encountered one like this before using the operator.

Checklist

[x] I have read the documentation and searched there for the problem
[x] I have searched in the GitHub Issues for similar questions

Keywords

apiserver, peering, kopf-managed, patch

Jul 14 '21 06:07 VidarHUN

Hello.

Long story short, I cannot answer that.

My assumption would be that Kopf-based operators, so as non-Kopf-based operators, are simply API clients. They do not sneak into API servers' internals. Therefore, there are no ways how the operator itself can slow down the API server because it is specifically a Kopf-based operator.

You can try the same with CLI patching to see if the same slow down happens (Kopf currently uses strategic-merge-patches):

kubectl patch -f examples/obj.yaml --type merge -p '{"metadata": {"annotations": {"test": "test"}}}'

However, I can imagine several ways how that number in your logs can grow.

First, if the payload is huge. But since Kopf logs all the patches it performs, you would see that payload in the logs too. Also as a part of this reason, it can be slow if Kubernetes sends back the huge payload — because it sends back the resulting patched object. Both cases depend on the size of the custom resources. I would only worry if it is hundreds of kilobytes or even megabytes per patch. Or if the network between the operator and the API server is slow.

Second, depending on what exactly that "audit event" measures, it can be a slow connection opening or closing. For example, open a connection, idle for 1 second, quickly do the patch in 0.1 seconds, idle for 1 more second, close the connection — 2.1 seconds in total.

This might easily happen, for example, if synchronous function calls are used in the asynchronous context (e.g. in async def handlers; or in the filtering callbacks). All in all, Python's asyncio event loop can be blocked by such sync operations, thus preventing TCP connection finalization (or handshaking, sending, receiving, closing — everything network-related). The solution would be to use sync handlers (def instead of async def) or asyncio executors for synchronous operations.

I wouldn't call it a problem, though it might look unusual that a small patch can take 2 seconds, of which 99% it idles (waits for the network activity).

Third, as an extension of the second, the asyncio event loop can be blocked by normal Python code — because Python takes its share of time to execute anyway. But this can only happen if your operator is really (really!) overloaded. 100 CRs are definitely not enough. I would consider such effects as possible starting from maybe 1'000 CRs having activity at the moment, maybe even 10'000.

With 100 CRs, this can only be simulated if the operator does not get its fair share of the CPU — e.g. if you set its CPU to 1 milli-core (or any small amount) — which would be an equivalent of sync operations in async code: i.e., there is no difference in blocking the asyncio network activity because of time.sleep() or because of the process throttled on the CPU level. But again, this throttling should be at such a scale, that the CPU does not get back to the process for a second or two, which would be extreme on its own. That can be solved by giving more CPU to the operator.

You can also read Henning Jacobs' talks or https://k8s.af/ for several stories about CPU quotas and CPU throttling and why they are disabled in Zalando.

Besides this, I have no other ideas why a presumably small patch can take 2 seconds.

Jul 14 '21 07:07 nolar

Thank you very much, your quick and helpful answer! Now I have a much clearer knowledge with CRs and kopf.

I will try these out and I hope I can solve this problem. Meanwhile I found a very interesting issue what can help me out. Because I don't use the x-kubernetes-map-type: atomic flag and maybe I have some missing x-kubernetes-preserve-unknown-fields: true flag.

What do you think about that?

Jul 14 '21 08:07 VidarHUN

I am not that experienced with Kubernetes management and low-level optimizations, so I have no opinion on this. The referred issue seems interesting though.

Keep in mind that if you have 700KB (or 1MB) CRs, Kopf will put its annotations with the last-handled-configuration, doubling that size to 1400KB (or 2MB). This can make it even slower.

If that is the case, consider making your own DiffBase storage class, which inherits from kopf.DiffBaseStorage, implements the fetch() and store() methods (see provided storages as examples), but stores the state elsewhere — e.g. in Redis, MySQL, Postgres, Sqlite, raw files, or whatever is persistent across operator restarts (not in memory!):

https://kopf.readthedocs.io/en/stable/configuration/#change-detection
https://kopf.readthedocs.io/en/stable/packages/kopf/?highlight=DiffBaseStorage#kopf.DiffBaseStorage
https://github.com/nolar/kopf/blob/1.32.1/kopf/_cogs/configs/diffbase.py#L106-L209 (examples)

Using your own storage will remove the last-handled-configuration annotation and thus the size and partially the network traffic forth & back.

The patches will remain, but they will be minimalistic: only the progress storage with its tiny temporary annotations (a json with 5-6 fields of true/false/timestamps).

If the size of a CR is so big that even the traffic (the resulting patched object after a PATCH operation) is the issue, you can implement your own progress storage (also fetch-store-purge methods). Then, the patches will be gone completely and Kopf will not be patching anything for its own needs.

The patches exist primarily for these two storages of the state — to work "out of the box" for 99.9% cases without any 3rd-party dependencies, using Kubernetes as the database (kind of). Overriding these two storages is the officially documented and supported way of avoiding that.

Jul 14 '21 09:07 nolar

So, if I understand it correctly the CRs will contain every last configuration, so the kopf will send double-sized CRs to the apiserver which can cause the problem in the linked issue. I think I don't have this problem because I don't have large CRs. I have something like this in my metadata field, after creation:

metadata:
  annotations:
    operator.l7mp.io/kopf-managed: "yes"
    operator.l7mp.io/last-handled-configuration: |
      {"spec":{"listener":{"rules":[{"action":{"rewrite":[{"path":"/labels/callid","valueStr":"31103112"},{"path":"/labels/tag","valueStr":"from-tag3110"}],"route":{"destinationRef":"/l7mp.io/v1/Target/default/ingress-rtcp-target","retry":{"num_retries":5,"retry_on":"always","timeout":250}}}}],"spec":{"UDP":{"options":{"mode":"server"},"port":10785}}},"selector":{"matchLabels":{"app":"l7mp-ingress"}},"updateOwners":true}}
  creationTimestamp: "2021-07-14T08:58:57Z"
  finalizers:
  - operator.l7mp.io/kopf-finalizer
  generation: 1
  name: ingress-rtcp-vsvc-31103112-fromtag3110
  namespace: default
  resourceVersion: "19657"
  uid: a955033e-6d07-42ef-b407-d2c8b220f49f

That's not so big.

I have another question. What does the kopf-managed = "yes" field means? Sorry, but I can't find in the documentation or in the code what does this mean.

So is it possible for the operator not to make PATCH messages at all and store the status in an external database?

Jul 14 '21 09:07 VidarHUN

No, Kopf only sends the patch, i.e. 1x size. The API server responds with the patched object, so 2x the size.

Yes, your CR size is small, the size should not be a problem. Something else is.

What does the kopf-managed = "yes" field means?

kopf-managed = "yes" is Kopf's way to mark annotations of another Kopf-based operator handling the same resources. It only happens when there is no *.kopf.zalando.org/ in the annotation name, but Kopf needs to know that the annotations are Kopf-produced — to prevent the "ping-pong flood" and killing the API servers if 2+ Kopf-based operators meet together.

See #539 for the very detailed explanation. Specifically, the paragraph with "2. Arbitrary prefixes".

So is it possible for the operator not to make PATCH messages at all and store the status in an external database?

Absolutely correct.

Instead of the PATCH requests to Kubernetes, the operator will be doing requests/connections to some other database of any kind. (If you do so, make sure that an async client is used for that database, or the sync calls are executed in thread pools via asyncio executors — as with any async/await code, basically.)

Jul 14 '21 10:07 nolar