kops
kops copied to clipboard
IRSA not working for kops 1.22.4
/kind support
1. What kops version are you running? The command kops version, will display this information.
1.22.4
2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.
1.21.9
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
serviceAccountIssuerDiscovery:
discoveryStore: s3://bucket-name
enableAWSOIDCProvider: true
was added to cluster spec as well as
useServiceAccountExternalPermissions: true
Service account and deployment were created through kubectl.
Pod is running with the correct env variables for default region, region, role ARN, web-identity-token-file, and sts-regional-endpoints.
However, when running kubectl exec -it -n default pod-identity-webhook-test -- aws sts get-caller-identity
or any other aws command I get the error:
Unable to locate credentials. You can configure credentials by running "aws configure".
5. Anything else do we need to know?
Not sure if this is an issue with just not having pod-identity-webhook
but using newer version of useServiceAccountExternalPermissions: true
.
Can share specific lines of cluster manifest or deployment if needed.
AWS Role is also essentially the same as https://github.com/kubernetes/kops/blob/master/tests/integration/update_cluster/many-addons-ccm-irsa24/data/aws_iam_role_aws-load-balancer-controller.kube-system.sa.minimal.example.com_policy as well
I don't believe that Kops 1.22.x is currently supported. You should upgrade to at least 1.23.2 and test there (Kops 1.23.x is compatible with Kubernetes 1.22.x).
Note also that the bucket you use for this cannot have any periods in the bucket name or it won't work. Note also that you need to use a different bucket than your state bucket and make sure that after you do a kops update
that files are actually created in the oidc bucket.
Kubernetes 1.22.1 Not that I think this is the culprit or anything but you should probably upgrade to the latest 1.22 release (1.22.11 I believe) just because there are a few known vulnerabilities in that version.
So I was actually wrong about kubernetes version. We are running kubernetes 1.21.9. As far as the bucket, we have tried multiple buckets and none of them have periods in them. We can upgrade kops but how can we know that that will fix the issue? Also let me know if you need more info to troubleshoot this better.
We can upgrade kops but how can we know that that will fix the issue?
How do you know it won't? There have been a huge amount of bugfixes between Kops 1.22 and Kops 1.23/1.24 and IRSA has seen a number of them as the feature has continued to mature. Kops 1.23/1.24 does still support Kubernetes 1.21.
We are running kubernetes 1.21.9
Kubernetes 1.21 is end of life. You should strongly consider upgrading.
Also let me know if you need more info to troubleshoot this better.
Like I said in my previous comment you should check that kops update cluster
is actually populating your OIDC bucket with the appropriate files and that the bucket has the right permissions on it (the bucket MUST be world-readable).
Did you install pod-identity-webhook manually? And is it working fine?
You may be able to see the error in the pod-identity-webhook logs as well.
So we have updated kops up to 1.23.2, however when using our pipeline we run into the following issue.
│ The plugin returned an unexpected error from
│ plugin.(*GRPCProvider).PlanResourceChange: rpc error: code =
│ ResourceExhausted desc = grpc: received message larger than max (4304578
│ vs. 4194304)
This error is with a certmanager-addons bucket object. This seems to be an issue with terraform, but wanted to reach out to see if you all thought it could be something else?
We are having issues now with the pod identity webhook it seems. Here are the kubelet logs.
-- Logs begin at Thu 2022-02-17 16:52:04 UTC. --
Jul 19 20:47:32 ip-10-15-4-159 kubelet[5318]: E0719 20:47:32.011164 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:47:36 ip-10-15-4-159 kubelet[5318]: E0719 20:47:36.319480 5318 kubelet.go:1683] "Failed creating a mirror pod for" err="Internal error occurred: failed calling webhook \"pod-identity-webhook.amazonaws.com\": Post \"https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=10s\": dial tcp 100.66.148.239:443: connect: connection refused" pod="kube-system/kube-controller-manager-ip-10-15-4-159.us-west-2.compute.internal"
Jul 19 20:47:37 ip-10-15-4-159 kubelet[5318]: E0719 20:47:37.012028 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:47:42 ip-10-15-4-159 kubelet[5318]: E0719 20:47:42.013117 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:47:42 ip-10-15-4-159 kubelet[5318]: E0719 20:47:42.308427 5318 kubelet.go:1683] "Failed creating a mirror pod for" err="Internal error occurred: failed calling webhook \"pod-identity-webhook.amazonaws.com\": Post \"https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=10s\": dial tcp 100.66.148.239:443: connect: connection refused" pod="kube-system/etcd-manager-main-ip-10-15-4-159.us-west-2.compute.internal"
Jul 19 20:47:47 ip-10-15-4-159 kubelet[5318]: E0719 20:47:47.014484 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:47:47 ip-10-15-4-159 kubelet[5318]: E0719 20:47:47.295964 5318 kubelet.go:1683] "Failed creating a mirror pod for" err="Internal error occurred: failed calling webhook \"pod-identity-webhook.amazonaws.com\": Post \"https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=10s\": dial tcp 100.66.148.239:443: connect: connection refused" pod="kube-system/kube-proxy-ip-10-15-4-159.us-west-2.compute.internal"
Jul 19 20:47:51 ip-10-15-4-159 kubelet[5318]: E0719 20:47:51.296391 5318 kubelet.go:1683] "Failed creating a mirror pod for" err="Internal error occurred: failed calling webhook \"pod-identity-webhook.amazonaws.com\": Post \"https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=10s\": dial tcp 100.66.148.239:443: connect: connection refused" pod="kube-system/etcd-manager-events-ip-10-15-4-159.us-west-2.compute.internal"
Jul 19 20:47:52 ip-10-15-4-159 kubelet[5318]: E0719 20:47:52.015670 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:47:57 ip-10-15-4-159 kubelet[5318]: E0719 20:47:57.016882 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:02 ip-10-15-4-159 kubelet[5318]: E0719 20:48:02.017397 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:07 ip-10-15-4-159 kubelet[5318]: E0719 20:48:07.017993 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:12 ip-10-15-4-159 kubelet[5318]: E0719 20:48:12.019221 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:17 ip-10-15-4-159 kubelet[5318]: E0719 20:48:17.019799 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:22 ip-10-15-4-159 kubelet[5318]: E0719 20:48:22.021300 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:24 ip-10-15-4-159 kubelet[5318]: E0719 20:48:24.288016 5318 kubelet.go:1683] "Failed creating a mirror pod for" err="Internal error occurred: failed calling webhook \"pod-identity-webhook.amazonaws.com\": Post \"https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=10s\": dial tcp 100.66.148.239:443: connect: connection refused" pod="kube-system/kube-scheduler-ip-10-15-4-159.us-west-2.compute.internal"
Jul 19 20:48:26 ip-10-15-4-159 kubelet[5318]: I0719 20:48:26.030440 5318 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-proxy-ip-10-15-4-159.us-west-2.compute.internal" status=Running
Jul 19 20:48:26 ip-10-15-4-159 kubelet[5318]: I0719 20:48:26.030486 5318 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-ip-10-15-4-159.us-west-2.compute.internal" status=Running
Jul 19 20:48:26 ip-10-15-4-159 kubelet[5318]: I0719 20:48:26.030500 5318 kubelet_getters.go:176] "Pod status updated" pod="kube-system/etcd-manager-events-ip-10-15-4-159.us-west-2.compute.internal" status=Running
Jul 19 20:48:26 ip-10-15-4-159 kubelet[5318]: I0719 20:48:26.030514 5318 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-ip-10-15-4-159.us-west-2.compute.internal" status=Running
Jul 19 20:48:26 ip-10-15-4-159 kubelet[5318]: I0719 20:48:26.030527 5318 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-ip-10-15-4-159.us-west-2.compute.internal" status=Running
Jul 19 20:48:26 ip-10-15-4-159 kubelet[5318]: I0719 20:48:26.030540 5318 kubelet_getters.go:176] "Pod status updated" pod="kube-system/etcd-manager-main-ip-10-15-4-159.us-west-2.compute.internal" status=Running
Jul 19 20:48:27 ip-10-15-4-159 kubelet[5318]: E0719 20:48:27.022458 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:32 ip-10-15-4-159 kubelet[5318]: E0719 20:48:32.023770 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:37 ip-10-15-4-159 kubelet[5318]: E0719 20:48:37.024581 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 19 20:48:42 ip-10-15-4-159 kubelet[5318]: E0719 20:48:42.025196 5318 kubelet.go:2211] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
When cert-manager is is enabled and set to managed:true
we get the above grpc error.
But when cert-manager is set to managed: false
then we run into this current issue and nodes and masters not getting up and running.
Any help would be appreciated. Thanks
Are you manually installing the webhook? Or custom static pods? It seems like the webhook blocks pods that the webhook shouldn't care about.
We installed using the default installation method that kops provides
Can you get the mutating webhook object and paste it here? It looks like it is matching things it never should. I am also confused about the webhook being installed before kops 1.23.
I also suggest updating to kops 1.24.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.