linkerd2
linkerd2 copied to clipboard
linkerd-destination policy fails on initial pod list, if it contains an object with invalid values
What is the issue?
If a single pod object contains an invalid value, the policy
container of linkerd-destionation
pods fails to become ready on parsing initial pod list. Without this deployment being available, all meshed pods stop working properly.
How can it be reproduced?
- Create a pod with invalid value
null
inspec.volumes[].projected.sources
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: fail-parsing
namespace: default
annotations:
linkerd.io/inject: enabled
spec:
volumes:
- name: exporter-config
projected:
sources: null
defaultMode: 420
containers:
- name: exporter
image: "busybox"
args:
- sleep
- 1h
volumeMounts:
- name: exporter-config
mountPath: /conf
EOF
Note: The kubernetes API server accepts this object
2. Restart linkerd-destination
deployment kubectl rollout restart deployment -n linkerd linkerd-destination
Logs, error output, etc
{"timestamp":"2022-06-14T10:39:49.418693Z","level":"WARN","fields":{"message":"{\"kind\":\"PodList\",\"apiVersion\":\"v1\" ... \"qosClass\":\"Burstable\"}}]}\n, Error(\"invalid type: null, expected a sequence\", line: 1, column: 2223366)"},"target":"kube::client","spans":[{"name":"pods"}]}
{"timestamp":"2022-06-14T10:39:49.460721Z","level":"INFO","fields":{"message":"Failed","error":"failed to perform initial object list: Error deserializing response"},"target":"linkerd_policy_controller_k8s_api::watch","spans":[]}
In plain text:
Error("invalid type: null, expected a sequence", line: 1, column: 2223366)
failed to perform initial object list: Error deserializing response
output of linkerd check -o short
❯ linkerd check -o short
Linkerd core checks
===================
linkerd-version
---------------
‼ cli is up-to-date
is running version 2.11.1 but the latest stable version is 2.11.2
see https://linkerd.io/2.11/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 2.11.1 but the latest stable version is 2.11.2
see https://linkerd.io/2.11/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-76f9b7cccb-b7rdr (stable-2.11.1)
* linkerd-destination-76f9b7cccb-gzvpv (stable-2.11.1)
* linkerd-destination-76f9b7cccb-hxh8c (stable-2.11.1)
* linkerd-identity-8448f698-h572b (stable-2.11.1)
* linkerd-identity-8448f698-rfdxx (stable-2.11.1)
* linkerd-identity-8448f698-xqlpg (stable-2.11.1)
* linkerd-proxy-injector-85df7dd89-hfcwm (stable-2.11.1)
* linkerd-proxy-injector-85df7dd89-tzmnh (stable-2.11.1)
* linkerd-proxy-injector-85df7dd89-v9w7q (stable-2.11.1)
see https://linkerd.io/2.11/checks/#l5d-cp-proxy-version for hints
Status check results are √
Linkerd extensions checks
=========================
linkerd-jaeger
--------------
‼ collector and jaeger service account exists
missing ServiceAccounts: jaeger
see https://linkerd.io/2.11/checks/#l5d-jaeger-sc-exists for hints
Status check results are √
linkerd-viz
-----------
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* grafana-5487ffc69d-jqjfj (stable-2.11.1)
* metrics-api-65799f4f58-9hv66 (stable-2.11.1)
* tap-54ddb4d68b-cf7pg (stable-2.11.1)
* tap-54ddb4d68b-q8pm4 (stable-2.11.1)
* tap-54ddb4d68b-zwc6p (stable-2.11.1)
* tap-injector-5887f7db94-8f2s7 (stable-2.11.1)
* web-75d7f664b-2jhj5 (stable-2.11.1)
see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cp-version for hints
‼ prometheus is installed and configured correctly
missing ClusterRoles: linkerd-linkerd-viz-prometheus
see https://linkerd.io/2.11/checks/#l5d-viz-prometheus for hints
Status check results are √
Environment
- Kubernetes version: v1.21.11-gke.900
- Environment: GKE
- Host OS: Linux
- Linkerd version: stable-2.11.1
Possible solution
To prevent a problem with one pod from breaking the whole mesh, Linkerd could skip pods with invalid values, while logging an error.
Additional context
The API reference for kubernetes 1.21 does not mention null
as a valid value in ProjectedVolumeSource but still the pod object is created.
Would you like to work on fixing this bug?
No response
This is most likely a problem that will have to be solved in https://github.com/Arnavion/k8s-openapi and, ultimately, in the Kubernetes API spec. Since the API spec does not describe the field as optional, deserializers that are derived from the API spec expect the field to be required.
This is similar to another issue we encountered https://github.com/kubernetes/kubernetes/issues/100802
To prevent a problem with one pod from breaking the whole mesh, Linkerd could skip pods with invalid values, while logging an error.
I'm not sure that we can realistically work around this in Linkerd--we don't actually handle decoding individual pod responses. Rather, the API clients throw an error about the whole API response. The best we could do is to update k8s-openapi to treat the field as optional.
Potentially related to https://github.com/kubernetes/kubernetes/issues/93903
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.