linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Linkerd helm not enforcing PDB values type

Open bwmetcalf opened this issue 1 year ago • 4 comments

What is the issue?

The helm chart allows a string or an int for controller.podDisruptionBudget.maxUnavailable and linkerd will happily deploy with, for example, either 1 or 25% for this value and the PDBs are defined accordingly and correctly. However, as soon as a pod starts up where linkerd-proxy needs to be inject, the following error will occur

admission webhook "linkerd-proxy-injector.linkerd.io" denied the request: failed to unmarshal JSON from: /var/run/linkerd/config/values: error unmarshaling JSON: while decoding JSON: json: cannot unmarshal string into Go struct field PodDisruptionBudget.controller.podDisruptionBudget.maxUnavailable of type int

It appears to be related to the strict typing in Go here: https://github.com/linkerd/linkerd2/blob/0965da659b75ee347269743e350347e92cb0d316/pkg/charts/linkerd2/values.go#L96.

Since k8s allows for percentages for PDBs, it seems the go code should be fixed to allow the same. Either way, if linkerd injection is going to fail if a percent is used in the helm chart values, linkerd should fail to deploy due to the same.

How can it be reproduced?

Define a PDB using a percent, deploy and then attempt to inject linkerd-proxy.

Logs, error output, etc

See initial description.

output of linkerd check -o short

% linkerd check -o short
linkerd-version
---------------
‼ cli is up-to-date
    unsupported version channel: stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane and cli versions match
    control plane running edge-24.10.4 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies and cli versions match
    linkerd-destination-6cfbc66dcf-rfqw7 running edge-24.10.4 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-cli-version for hints

linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
    kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
    see https://linkerd.io/2.14/checks/#l5d-injection-disabled for hints

linkerd-viz
-----------
‼ viz extension proxies and cli versions match
    metrics-api-6ddbbd456-tmtdq running edge-24.10.4 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

% kubectl version Client Version: v1.28.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.8-eks-a737599

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

bwmetcalf avatar Oct 29 '24 21:10 bwmetcalf

Thanks, @bwmetcalf! As I understand it, actually supporting the "%" form is less important to you than having the chart and the CRD be consistent, correct?

kflynn avatar Oct 31 '24 15:10 kflynn

It would be ideal to be able to use % so we don't have to change the value for maxUnavailable as we increase or decrease the number of replicas. However, if % isn't going to be supported at injection time, the helm chart should not support it either so it's more consistent with a "fail-fast" paradigm.

bwmetcalf avatar Oct 31 '24 16:10 bwmetcalf

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 30 '25 07:01 stale[bot]

I would like to provide a fix for this issue. My idea is to change the maxUnavailable to type IntOrString, like for the PodDisruptionBudget type. Why do you think about that @alpeb?

Also for testing, do I need to generate a new type of golden file and test to cover the % or is there a better alternative?

wim-de-groot avatar Jun 14 '25 22:06 wim-de-groot

Thanks for taking this on, @wim-de-groot! Unfortunately, values.go is a bit of debt in the codebase. It's been a challenge to keep this typed go struct in sync with the values.yml used by the Helm charts and by the chart rendering done by the Linkerd CLI. The only reason we keep it around is because setting fields into a typed go struct is more ergonomic than setting fields (especially deeply nested fields) into an untyped map[string]interface{}.

With that said, there's actually nowhere in our go codebase that we specifically set maxUnavailable so I think we can sidestep this issue by making the Controller field of the Values struct at map[string]interface{}. You can see that we do the same thing for DeploymentStrategy and follow that as an example.

Updating an existing test or adding a new one where we set a % based PDB in the values would be a good way to test this.

adleong avatar Jun 30 '25 18:06 adleong

I set the default value of the controller.podDisruptionBudget.maxUnavailable to 25% instead of just 1. Its the same as for deploymentStrategy. I'am not sure if this is desirable, please let me know.

wim-de-groot avatar Jul 03 '25 17:07 wim-de-groot