calico Regularly see E0164 - timeout or abort while handling: method GET URI /apis/projectcalico.org/v3

Hi We regularly see these errors below on our kube-apiservers, that suggest something is not right with the aggregated v3 api's. However, we don't see any negative impact, calico is working fine, the apiserver in calico-apiserver namespace all seems fine.

We removed the custom resource for default APIServer and put it back on, no difference..

Should we be concerned?

Expected Behavior

No errors in logs - errors in kube-apiserver logs attract our attention quickly ;-)

Current Behavior

wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/profiles?allowWatchBookmarks=true&resourceVersion=73417144%2F73417144&watch=true" audit-ID="c17b2ff6-c3a8-4698-a9b5-58b9dc6bf56a"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/globalnetworksets?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="e853832c-3508-48a7-b5a8-f3e787df9c94
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/hostendpoints?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="31ef1042-a457-4510-997a-26cd306e0551"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/bgppeers?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="a4bd1b53-156b-4f62-b8d7-8413283d9a2b"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/kubecontrollersconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="540af559-1cf8-47ed-9a03-6165a016901b"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/ipamconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="d0ce56d5-209b-4241-8442-787ce6903d50"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/clusterinformations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="f67a7af1-d9cf-4110-9c56-80ee4a87c545"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/felixconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="bf03ef4f-73f4-4ae0-aeb4-30f8c1dacad4"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/networksets?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="5a855bef-247b-4214-8909-4eedf83d5ec1"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/globalnetworkpolicies?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="fd9c9329-8a19-4bce-a1b3-5863aefeb72e"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/ippools?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="f1972004-bc13-43c0-8f97-701fae5f1f91"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/caliconodestatuses?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="aa809b3f-23bd-41f9-9713-43a90270c736"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/blockaffinities?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="34ef13d1-658b-4141-bea3-4c0c5ab09a69"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/networkpolicies?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="ce6fe12c-039c-4005-a0f3-f6e0d46d7369"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/bgpconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="9857ed1e-e75f-4c4b-87a1-c06210a0eea1"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/ipreservations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="43db0f4e-0d78-42fe-b2ef-4442fed68325"

Steps to Reproduce (for bugs)

Fresh K8s 1.25 Cluster
Install tigera-operator
Add default Installation
Add default APIService

apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}

Your Environment

Calico version 3.25.1, but have seen on earlier versions on other clusters..
Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes 1.25

Jun 15 '23 00:06 mikementzmaersk

This might not necessarily be a problem - I believe watch=true requests have an inherent timeout after which they are terminated due to inactivity.

Does this happen roughly once per minute for each of those URIs? I believe that's the default timeout.

Jun 23 '23 18:06 caseydavenport

Hi @caseydavenport - every 10 minutes without fail.

Jun 27 '23 04:06 mikementzmaersk

It would be good to make those logs go away - I will see what can be done about that... but I think it's safe to say these are benign logs and essentially working as expected, since long-lived "watch" requests against the API server that receive no traffic will eventually be terminated and restarted by design.

Jun 27 '23 22:06 caseydavenport

Thanks Casey - that was our assumption as well and glad to have it confirmed. So not an actual error really 😉

Jul 01 '23 08:07 mikementzmaersk

We are seeing these as well in our kube-api-server logs. Is there a way to disable these? Cause they cause a massive amount of spam over time, and if they are benign why are they being logged as WARN or even ERROR sometimes?

Nov 07 '23 08:11 marchermans

Is there any good solution?

Jun 18 '24 05:06 leejoyful

Hello there,

I'm actually having this issue, did any roundabout or roadmap exist for this ? My environment: Calico: 3.28.0 Kubernetes: 1.29.6

Jul 09 '24 07:07 urcus

anything changed on this?

Nov 27 '24 06:11 davhdavh

I believe @ti-afra is working on better logging and log level configuration for the API server, which may solve this.

Dec 02 '24 23:12 caseydavenport

I have the same issue :( every 10 minute

Jan 30 '25 09:01 tgajdo

Same issue

E0217 07:27:48.840253 1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/tiers?allowWatchBookmarks=true&resourceVersion=40565403&watch=true" auditID="1ba02dd4-d0ca-4662-b2ea-3fe7f2c877e2" E0217 07:27:48.840310 1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/ipamconfigurations?allowWatchBookmarks=true&resourceVersion=40565402&watch=true" auditID="fcf771b1-d5bc-4cd9-b158-f861dd1ac969" E0217 07:27:48.842002 1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/blockaffinities?allowWatchBookmarks=true&resourceVersion=40565404&watch=true" auditID="b906cc08-c7b2-41a9-8a18-8b1e197729b6"

Feb 17 '25 07:02 Anton-Kartashev

Filed an upstream bug; finding it very hard to track down the relevant code in k8s in order to figure out if we're doing something wrong here: https://github.com/kubernetes/kubernetes/issues/130410

Feb 25 '25 11:02 fasaxc

I spun up a cluster with Calico API server to try to reproduce this

$ kubectl version
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.31.7

calicoVersion: v3.29.3

After spinning up the cluster and leaving it sat idle I do not see that error. However, if I

Watch a resource with kubectl get profiles.projectcalico.org -w
Ctrl-C that watch.

Then I get the error immediately.

E0407 10:32:19.006721       1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/profiles?resourceVersion=6226%2F6226&watch=true" auditID="d91ea927-9dd9-44b8-a5fd-c2f9f4fa6beb"

I did not see the error after leaving kubectl watching the resource for >10min. I only see the error when I terminate the watch request from the client side.

Apr 07 '25 10:04 fasaxc

@mikementzmaersk @Anton-Kartashev @tgajdo @davhdavh @urcus do you have anything in your cluster that might be watching calico resources with a 10-minute timeout? Calico itself generally watches our internal CRD resources so it'd likely be something external. For example, perhaps you're running Argo or some other component that monitors everything in the cluster.

Apr 07 '25 10:04 fasaxc

Also, what type of cluster is this EKS? AKS? RKE? MKE? .... I think the problem is triggered by something outside Calico and k8s. Would like to confirm that.

If you can crank the kube-apiserver's debug level up to 7 then I think it'd log who was making the requests. That can be done with the -v=7 flag when running the API server. Note: this is performance impacting so don't do it in prod.

Apr 09 '25 10:04 fasaxc

Also, what type of cluster is this EKS? AKS? RKE? MKE?

Self-hosted kubeadm based. Version 1.32.3. Can't access to calico api via curl from master node:

curl -k -H "Authorization: Bearer [redacted]" "https://[redacted]:6443/apis/projectcalico.org/v3/kubecontrollersconfigurations?allowWatchBookmarks=true&resourceVersion=62769906&watch=true"

got timeout

curl https://[redacted]:6443/ works fine and show calico api's in the list:

"/apis/projectcalico.org", "/apis/projectcalico.org/v3",

Apr 09 '25 10:04 Anton-Kartashev

Yep, watch is a long-running operation so curl will time out. Are you running Argo or some other CI/CD app in your cluster?

I'm looking for an app that is watching "everything" and causing these timeouts.

Apr 09 '25 10:04 fasaxc

Yep, watch is a long-running operation so curl will time out. Are you running Argo or some other CI/CD app in your cluster?

I'm looking for an app that is watching "everything" and causing these timeouts.

Yes, have argocd

Apr 09 '25 10:04 Anton-Kartashev

Khm, I have omit &watch=true and connection was successful and produces some output. So, this is not an timeout, k8s api wait for a new event. Btw, calico-apiserver SA can't list that endpoint:

"message": "kubecontrollersconfigurations.projectcalico.org is forbidden: User "system:serviceaccount:calico-apiserver:calico-apiserver" cannot list resource "kubecontrollersconfigurations" in API group "projectcalico.org" at the cluster scope",

Apr 09 '25 12:04 Anton-Kartashev

Self-hosted cluster. Uses argocd.

Apr 10 '25 04:04 davhdavh

I've submitted a patch upstream to fix the issue: https://github.com/kubernetes/kubernetes/pull/131245

It comes down to the proxying logic in the API server raising a particular error that is normally only raised for timed-out non-watch requests. That gets logged because, for those requests it indicates an overload. However, for a watch request it typically means "client closed connection".

Apr 10 '25 14:04 fasaxc

@Anton-Kartashev @davhdavh is there a way that you can confirm that Argo is causing the timeouts (perhaps disable Argo for 60s to see if the errors stop)? I think there are two bugs here:

Upstream bug in k8s API server making the spammy log.
Bug in whatever is timing out after 10s (perhaps Argo). Watches should last a lot longer than that; 5 minutes +100% jitter is the default. If the request times out server-side (instead of client closing connection early) then the error doesn't occur.

Apr 14 '25 08:04 fasaxc

@fasaxc hello, confirm this. After disabling argocd pods I don't see any errors in kube-apiserver pod. Investigate this.

Apr 17 '25 09:04 Anton-Kartashev

Thanks for the update @Anton-Kartashev ; that suggests that the version of Argo that you're using has a bug, where it is timing out watch requests every 10s at the client side. For normal (not aggregated API) resources, the API server doesn't log anything when that happens but for aggregated resources you get this error.

Apr 22 '25 10:04 fasaxc

Just letting you know, we are facing the same log spam here.

Filtering our logs for ipamconfigurations, we get about 75 entries per hour:

E0930 18:07:50.541118       1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/ipamconfigurations?allowWatchBookmarks=true&resourceVersion=43043378&watch=true" auditID="c90b8684-fd1f-4086-88b6-350942100342"

Filtering for projectcalico.org (all CRDs), we get about 1k hits per hour.

The Cluster generating this has about 200 Nodes.

Sep 30 '25 18:09 jonatasbaldin