linkerd2
linkerd2 copied to clipboard
HTTPRoute `status` field not populated at all, or takes a very long time
What is the issue?
HTTPRoute is not being picked up by Linkerd, hence status
field is not populated at all, or takes a very long time, could take tens of minutes.
The policy controller container in the destination pod keeps throwing errors "Failed to patch HTTPRoute" with reason httproute NotFound.
policy
container memory usage is quite high (several gigs) compared to the other components.
In cases where we created a high number of httproutes, say 1000, memory usage increases steeply until it hits the limit and OOMKilled, in our case 16Gi.
It backs to a normal working state with a restart:
kubectl rollout restart -n linkerd deployment linkerd-destination
How can it be reproduced?
Create new httproutes, or update/delete existing httproutes constantly. Existing httproute, but we see NotFound errors for it (logs copied below):
kind: HTTPRoute
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"policy.linkerd.io/v1beta2","kind":"HTTPRoute","metadata":{"annotations":{},"labels":{"app.kubernetes.io/managed-by":"kustomize","app.kubernetes.io/name":"my-controller","app.kubernetes.io/part-of":"my-app"},"name":"controller-route-default","namespace":"my-sandbox"},"spec":{"parentRefs":[{"group":"core","kind":"Service","name":"my-controller","port":5051}]}}
creationTimestamp: '2024-03-21T08:01:56Z'
generation: 1
labels:
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: my-controller
app.kubernetes.io/part-of: my-app
managedFields:
- apiVersion: policy.linkerd.io/v1beta2
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:labels:
.: {}
f:app.kubernetes.io/managed-by: {}
f:app.kubernetes.io/name: {}
f:app.kubernetes.io/part-of: {}
f:spec:
.: {}
f:parentRefs: {}
f:rules: {}
manager: kubectl-client-side-apply
operation: Update
time: '2024-03-21T08:01:56Z'
- apiVersion: policy.linkerd.io/v1beta3
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:parents: {}
manager: policy.linkerd.io
operation: Update
subresource: status
time: '2024-03-21T10:32:10Z'
name: controller-route-default
namespace: my-sandbox
resourceVersion: '647402861'
uid: 2b3b0205-3f98-4f9f-a183-5c637e8f057b
selfLink: >-
/apis/policy.linkerd.io/v1beta3/namespaces/my-sandbox/httproutes/controller-route-default
status:
parents:
- conditions:
- lastTransitionTime: '2024-03-21T10:21:58Z'
message: ''
reason: Accepted
status: 'True'
type: Accepted
- lastTransitionTime: '2024-03-21T10:21:58Z'
message: ''
reason: BackendNotFound
status: 'False'
type: ResolvedRefs
controllerName: linkerd.io/policy-controller
parentRef:
group: core
kind: Service
name: my-controller
namespace: my-sandbox
spec:
parentRefs:
- group: core
kind: Service
name: my-controller
port: 5051
rules:
- matches:
- path:
type: PathPrefix
value: /
New httproute which its status not populated till we restart the linkerd destination pod:
kind: HTTPRoute
metadata:
creationTimestamp: '2024-03-21T08:22:06Z'
generation: 1
managedFields:
- apiVersion: policy.linkerd.io/v1beta3
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:parentRefs: {}
f:rules: {}
manager: fabric8
operation: Apply
time: '2024-03-21T08:22:06Z'
- apiVersion: policy.linkerd.io/v1beta3
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:parents: {}
manager: policy.linkerd.io
operation: Update
subresource: status
time: '2024-03-21T10:32:04Z'
name: controller-route-user-476722
namespace: my-sandbox
resourceVersion: '647402570'
uid: 5b09c67e-2cab-4e07-8dee-75049e6f1812
selfLink: >-
/apis/policy.linkerd.io/v1beta3/namespaces/my-sandbox/httproutes/controller-route-user-476722
status:
parents:
- conditions:
- lastTransitionTime: '2024-03-21T10:21:52Z'
message: ''
reason: Accepted
status: 'True'
type: Accepted
- lastTransitionTime: '2024-03-21T10:21:52Z'
message: ''
reason: ResolvedRefs
status: 'True'
type: ResolvedRefs
controllerName: linkerd.io/policy-controller
parentRef:
group: core
kind: Service
name: my-controller
namespace: my-sandbox
spec:
parentRefs:
- group: core
kind: Service
name: my-controller
port: 5051
rules:
- backendRefs:
- group: core
kind: Service
name: my-app-0
port: 3004
weight: 1
matches:
- headers:
- name: x-user-id
type: Exact
value: '476722'
path:
type: PathPrefix
value: /
Logs, error output, etc
{"timestamp":"2024-03-20T18:43:03.987665Z","level":"INFO","fields":{"message":"Lease already exists, no need to create it"},"target":"linkerd_policy_controller"}
{"timestamp":"2024-03-20T18:43:04.019044Z","level":"INFO","fields":{"message":"policy gRPC server listening","addr":"0.0.0.0:8090"},"target":"linkerd_policy_controller","spans":[{"port":"8090","name":"grpc"}]}
{"timestamp":"2024-03-21T08:01:18.459737Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:20.918075Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:23.317981Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:25.408224Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:28.438961Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:29.134401Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:31.952588Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:34.950543Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:35.602828Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:36.520973Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:38.480231Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:41.174658Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:42.805331Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:44.907200Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:46.396935Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:48.372032Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:50.534977Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:53.104461Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T08:01:56.102044Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-default\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-default\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-default\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
output of linkerd check -o short
linkerd check -o short
linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2024-03-23T09:34:37Z
see https://linkerd.io/2/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
linkerd-version
---------------
‼ cli is up-to-date
is running version 24.3.2 but the latest edge version is 24.3.3
see https://linkerd.io/2/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 24.3.2 but the latest edge version is 24.3.3
see https://linkerd.io/2/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-56f85576c7-tpx4h (edge-24.3.2)
* linkerd-identity-575f48d794-9hmxb (edge-24.3.2)
* linkerd-proxy-injector-678f5b6b99-kbzkk (edge-24.3.2)
see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints
linkerd-viz
-----------
‼ linkerd-viz pods are injected
could not find proxy container for linkerd-cni-bv45t pod
see https://linkerd.io/2/checks/#l5d-viz-pods-injection for hints
‼ viz extension pods are running
container "linkerd-proxy" in pod "metrics-api-544b76757-7zk8v" is not ready
see https://linkerd.io/2/checks/#l5d-viz-pods-running for hints
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-56f85576c7-tpx4h (edge-24.3.2)
* linkerd-identity-575f48d794-9hmxb (edge-24.3.2)
* linkerd-proxy-injector-678f5b6b99-kbzkk (edge-24.3.2)
see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints
Status check results are √
Environment
- Kubernetes version: v1.26.4
- Cluster Environment: on-prem, kubeadm vanilla kubernetes
- Host OS: RHEL 8.9
- Linkerd version: edge-24.3.2
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None
Another httproute controller-route-user-5186
created and the one above deleted controller-route-user-476722
,
and policy controller keeps throwing hundreds of the same errors:
{"timestamp":"2024-03-21T12:00:12.113301Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-476722\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-user-476722\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-user-476722\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T12:00:12.430386Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-5186\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-user-5186\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-user-5186\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
and it took about 40 minutes for the status
field to be updated. note the creationTimestamp and status update time.
apiVersion: policy.linkerd.io/v1beta3
kind: HTTPRoute
metadata:
creationTimestamp: '2024-03-21T13:09:58Z'
generation: 1
managedFields:
- apiVersion: policy.linkerd.io/v1beta3
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:parentRefs: {}
f:rules: {}
manager: fabric8
operation: Apply
time: '2024-03-21T13:09:58Z'
- apiVersion: policy.linkerd.io/v1beta3
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:parents: {}
manager: policy.linkerd.io
operation: Update
subresource: status
time: '2024-03-21T13:49:16Z'
name: controller-route-user-5186
namespace: my-sandbox
resourceVersion: '648189883'
uid: d350c128-3751-4b20-8f85-d0959ffa6c21
selfLink: >-
/apis/policy.linkerd.io/v1beta3/namespaces/my-sandbox/httproutes/controller-route-user-5186
status:
parents:
- conditions:
- lastTransitionTime: '2024-03-21T13:14:16Z'
message: ''
reason: Accepted
status: 'True'
type: Accepted
- lastTransitionTime: '2024-03-21T13:14:16Z'
message: ''
reason: ResolvedRefs
status: 'True'
type: ResolvedRefs
controllerName: linkerd.io/policy-controller
parentRef:
group: core
kind: Service
name: my-controller
namespace: my-sandbox
spec:
parentRefs:
- group: core
kind: Service
name: my-controller
port: 5051
rules:
- backendRefs:
- group: core
kind: Service
name: my-app-0
port: 3004
weight: 1
matches:
- headers:
- name: x-user-id
type: Exact
value: '5186'
path:
type: PathPrefix
value: /
@aminafshar this looks like it is likely the same issue as https://github.com/linkerd/linkerd2/issues/12104 and is fixed in https://github.com/linkerd/linkerd2/pull/12215
This was fixed in https://github.com/linkerd/linkerd2/releases/tag/edge-24.3.4. Please let us know if issues persist.
@adleong , @olix0r Now we're running edge-24.4.1 (Kubernetes version: v1.28.8), It seems resource-wise policy controller running normally and memory leak issue resolved but still we are seeing a long delay of about several minutes between httproute creation and status field update and we see lots of errors as below
{"timestamp":"2024-04-15T08:18:19.231788Z","level":"ERROR","fields":{"message":"Failed to send HTTPRoute patch","id.namespace":"pangolin","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-4288\" }","error":"no available capacity"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Index"}]}
As you see memory usage became a flat line for the last 7 hours, and seems the policy controller is just stuck in that state, keeps throwing the same error
2024-04-15T13:53:09+03:00 {"timestamp":"2024-04-15T10:53:09.703054Z","level":"ERROR","fields":{"message":"Failed to send HTTPRoute patch","id.namespace":"pangolin","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-0123\" }","error":"no available capacity"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"httproutes.policy.linkerd.io"}]}
Hi @aminafshar, sorry to hear you're still experiencing this.
Those error messages indicates that the policy controller is generating HTTPRoute status patches more quickly than the kubernetes API can keep up with. The policy controller will only generate a patch for an HTTPRoute if the HTTPRoute's status is out of date and needs to be updated. I've attempted to reproduce this with 1000 HTTPRoutes but I only see patches generated when the HTTTPRoutes are first created and not continuously like you seem to be experiencing. Are HTTPRoutes being created or updated rapidly by some controller or automated process?
If you can provide the output of linkerd diagnostics controller-metrics
, it can help us confirm what we're seeing. If you can also share the yaml formatted output from one of these HTTPRoutes (e.g. kubectl get httproute/X -o yaml
) we can see if anything seems unexpected about the resource itself or its status.
Hi @adleong , I asked our developers to provide info on how they create and manage httproutes.
At the time of writing, there are about ~60 httproutes on the cluster and only a few deleted/created recently. linkerd-destination pods restarted, running for the last ~2hours. Logs and diagnostics output and some recent httproutes yaml output attached. linkerd-diagnostics-controller-metrics.txt policy_linkerd-destination-887769595-492pk.log policy_linkerd-destination-887769595-hdmzp.log policy_linkerd-destination-887769595-gttn5.log httproutes.yml.txt
Thank you for this very helpful data. Using this, I was able to reproduce the issue and found the root cause to be a missing field in the HTTPRoute CRD schema. I've added the missing field here https://github.com/linkerd/linkerd2/pull/12454 and confirmed that this resolves the issue.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Closing, since this has been resolved
Thank you for this very helpful data. Using this, I was able to reproduce the issue and found the root cause to be a missing field in the HTTPRoute CRD schema. I've added the missing field here https://github.com/linkerd/linkerd2/pull/12454 and confirmed that this resolves the issue.