calico icon indicating copy to clipboard operation
calico copied to clipboard

Regularly see E0164 - timeout or abort while handling: method GET URI /apis/projectcalico.org/v3

Open mikementzmaersk opened this issue 2 years ago • 6 comments

Hi We regularly see these errors below on our kube-apiservers, that suggest something is not right with the aggregated v3 api's. However, we don't see any negative impact, calico is working fine, the apiserver in calico-apiserver namespace all seems fine.

We removed the custom resource for default APIServer and put it back on, no difference..

Should we be concerned?

Expected Behavior

No errors in logs - errors in kube-apiserver logs attract our attention quickly ;-)

Current Behavior

wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/profiles?allowWatchBookmarks=true&resourceVersion=73417144%2F73417144&watch=true" audit-ID="c17b2ff6-c3a8-4698-a9b5-58b9dc6bf56a"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/globalnetworksets?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="e853832c-3508-48a7-b5a8-f3e787df9c94
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/hostendpoints?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="31ef1042-a457-4510-997a-26cd306e0551"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/bgppeers?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="a4bd1b53-156b-4f62-b8d7-8413283d9a2b"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/kubecontrollersconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="540af559-1cf8-47ed-9a03-6165a016901b"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/ipamconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="d0ce56d5-209b-4241-8442-787ce6903d50"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/clusterinformations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="f67a7af1-d9cf-4110-9c56-80ee4a87c545"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/felixconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="bf03ef4f-73f4-4ae0-aeb4-30f8c1dacad4"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/networksets?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="5a855bef-247b-4214-8909-4eedf83d5ec1"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/globalnetworkpolicies?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="fd9c9329-8a19-4bce-a1b3-5863aefeb72e"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/ippools?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="f1972004-bc13-43c0-8f97-701fae5f1f91"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/caliconodestatuses?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="aa809b3f-23bd-41f9-9713-43a90270c736"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/blockaffinities?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="34ef13d1-658b-4141-bea3-4c0c5ab09a69"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/networkpolicies?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="ce6fe12c-039c-4005-a0f3-f6e0d46d7369"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/bgpconfigurations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="9857ed1e-e75f-4c4b-87a1-c06210a0eea1"
wrap.go:53] timeout or abort while handling: method=GET URI="/apis/projectcalico.org/v3/ipreservations?allowWatchBookmarks=true&resourceVersion=73417144&watch=true" audit-ID="43db0f4e-0d78-42fe-b2ef-4442fed68325"

Steps to Reproduce (for bugs)

  1. Fresh K8s 1.25 Cluster
  2. Install tigera-operator
  3. Add default Installation
  4. Add default APIService
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}

Your Environment

  • Calico version 3.25.1, but have seen on earlier versions on other clusters..
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes 1.25

mikementzmaersk avatar Jun 15 '23 00:06 mikementzmaersk

This might not necessarily be a problem - I believe watch=true requests have an inherent timeout after which they are terminated due to inactivity.

Does this happen roughly once per minute for each of those URIs? I believe that's the default timeout.

caseydavenport avatar Jun 23 '23 18:06 caseydavenport

Hi @caseydavenport - every 10 minutes without fail.

mikementzmaersk avatar Jun 27 '23 04:06 mikementzmaersk

It would be good to make those logs go away - I will see what can be done about that... but I think it's safe to say these are benign logs and essentially working as expected, since long-lived "watch" requests against the API server that receive no traffic will eventually be terminated and restarted by design.

caseydavenport avatar Jun 27 '23 22:06 caseydavenport

Thanks Casey - that was our assumption as well and glad to have it confirmed. So not an actual error really 😉

mikementzmaersk avatar Jul 01 '23 08:07 mikementzmaersk

We are seeing these as well in our kube-api-server logs. Is there a way to disable these? Cause they cause a massive amount of spam over time, and if they are benign why are they being logged as WARN or even ERROR sometimes?

marchermans avatar Nov 07 '23 08:11 marchermans

Is there any good solution?

leejoyful avatar Jun 18 '24 05:06 leejoyful

Hello there,

I'm actually having this issue, did any roundabout or roadmap exist for this ? My environment: Calico: 3.28.0 Kubernetes: 1.29.6

urcus avatar Jul 09 '24 07:07 urcus

anything changed on this?

davhdavh avatar Nov 27 '24 06:11 davhdavh

I believe @ti-afra is working on better logging and log level configuration for the API server, which may solve this.

caseydavenport avatar Dec 02 '24 23:12 caseydavenport

I have the same issue :( every 10 minute

tgajdo avatar Jan 30 '25 09:01 tgajdo

Same issue

E0217 07:27:48.840253 1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/tiers?allowWatchBookmarks=true&resourceVersion=40565403&watch=true" auditID="1ba02dd4-d0ca-4662-b2ea-3fe7f2c877e2" E0217 07:27:48.840310 1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/ipamconfigurations?allowWatchBookmarks=true&resourceVersion=40565402&watch=true" auditID="fcf771b1-d5bc-4cd9-b158-f861dd1ac969" E0217 07:27:48.842002 1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/blockaffinities?allowWatchBookmarks=true&resourceVersion=40565404&watch=true" auditID="b906cc08-c7b2-41a9-8a18-8b1e197729b6"

Anton-Kartashev avatar Feb 17 '25 07:02 Anton-Kartashev

Filed an upstream bug; finding it very hard to track down the relevant code in k8s in order to figure out if we're doing something wrong here: https://github.com/kubernetes/kubernetes/issues/130410

fasaxc avatar Feb 25 '25 11:02 fasaxc

I spun up a cluster with Calico API server to try to reproduce this

$ kubectl version
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.31.7
calicoVersion: v3.29.3

After spinning up the cluster and leaving it sat idle I do not see that error. However, if I

  • Watch a resource with kubectl get profiles.projectcalico.org -w
  • Ctrl-C that watch.

Then I get the error immediately.

E0407 10:32:19.006721       1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/profiles?resourceVersion=6226%2F6226&watch=true" auditID="d91ea927-9dd9-44b8-a5fd-c2f9f4fa6beb"

I did not see the error after leaving kubectl watching the resource for >10min. I only see the error when I terminate the watch request from the client side.

fasaxc avatar Apr 07 '25 10:04 fasaxc

@mikementzmaersk @Anton-Kartashev @tgajdo @davhdavh @urcus do you have anything in your cluster that might be watching calico resources with a 10-minute timeout? Calico itself generally watches our internal CRD resources so it'd likely be something external. For example, perhaps you're running Argo or some other component that monitors everything in the cluster.

fasaxc avatar Apr 07 '25 10:04 fasaxc

Also, what type of cluster is this EKS? AKS? RKE? MKE? .... I think the problem is triggered by something outside Calico and k8s. Would like to confirm that.

If you can crank the kube-apiserver's debug level up to 7 then I think it'd log who was making the requests. That can be done with the -v=7 flag when running the API server. Note: this is performance impacting so don't do it in prod.

fasaxc avatar Apr 09 '25 10:04 fasaxc

Also, what type of cluster is this EKS? AKS? RKE? MKE?

Self-hosted kubeadm based. Version 1.32.3. Can't access to calico api via curl from master node:

curl -k -H "Authorization: Bearer [redacted]" "https://[redacted]:6443/apis/projectcalico.org/v3/kubecontrollersconfigurations?allowWatchBookmarks=true&resourceVersion=62769906&watch=true"

got timeout

curl https://[redacted]:6443/ works fine and show calico api's in the list:

"/apis/projectcalico.org", "/apis/projectcalico.org/v3",

Anton-Kartashev avatar Apr 09 '25 10:04 Anton-Kartashev

Yep, watch is a long-running operation so curl will time out. Are you running Argo or some other CI/CD app in your cluster?

I'm looking for an app that is watching "everything" and causing these timeouts.

fasaxc avatar Apr 09 '25 10:04 fasaxc

Yep, watch is a long-running operation so curl will time out. Are you running Argo or some other CI/CD app in your cluster?

I'm looking for an app that is watching "everything" and causing these timeouts.

Yes, have argocd

Anton-Kartashev avatar Apr 09 '25 10:04 Anton-Kartashev

Khm, I have omit &watch=true and connection was successful and produces some output. So, this is not an timeout, k8s api wait for a new event. Btw, calico-apiserver SA can't list that endpoint:

"message": "kubecontrollersconfigurations.projectcalico.org is forbidden: User "system:serviceaccount:calico-apiserver:calico-apiserver" cannot list resource "kubecontrollersconfigurations" in API group "projectcalico.org" at the cluster scope",

Anton-Kartashev avatar Apr 09 '25 12:04 Anton-Kartashev

Self-hosted cluster. Uses argocd.

davhdavh avatar Apr 10 '25 04:04 davhdavh

I've submitted a patch upstream to fix the issue: https://github.com/kubernetes/kubernetes/pull/131245

It comes down to the proxying logic in the API server raising a particular error that is normally only raised for timed-out non-watch requests. That gets logged because, for those requests it indicates an overload. However, for a watch request it typically means "client closed connection".

fasaxc avatar Apr 10 '25 14:04 fasaxc

@Anton-Kartashev @davhdavh is there a way that you can confirm that Argo is causing the timeouts (perhaps disable Argo for 60s to see if the errors stop)? I think there are two bugs here:

  • Upstream bug in k8s API server making the spammy log.
  • Bug in whatever is timing out after 10s (perhaps Argo). Watches should last a lot longer than that; 5 minutes +100% jitter is the default. If the request times out server-side (instead of client closing connection early) then the error doesn't occur.

fasaxc avatar Apr 14 '25 08:04 fasaxc

@fasaxc hello, confirm this. After disabling argocd pods I don't see any errors in kube-apiserver pod. Investigate this.

Anton-Kartashev avatar Apr 17 '25 09:04 Anton-Kartashev

Thanks for the update @Anton-Kartashev ; that suggests that the version of Argo that you're using has a bug, where it is timing out watch requests every 10s at the client side. For normal (not aggregated API) resources, the API server doesn't log anything when that happens but for aggregated resources you get this error.

fasaxc avatar Apr 22 '25 10:04 fasaxc

Just letting you know, we are facing the same log spam here.

Filtering our logs for ipamconfigurations, we get about 75 entries per hour:

E0930 18:07:50.541118       1 wrap.go:53] "Timeout or abort while handling" logger="UnhandledError" method="GET" URI="/apis/projectcalico.org/v3/ipamconfigurations?allowWatchBookmarks=true&resourceVersion=43043378&watch=true" auditID="c90b8684-fd1f-4086-88b6-350942100342"

Filtering for projectcalico.org (all CRDs), we get about 1k hits per hour.

The Cluster generating this has about 200 Nodes.

jonatasbaldin avatar Sep 30 '25 18:09 jonatasbaldin