"kops update cluster" panics while creating JWKS for OIDC
1. What kops version are you running?
Client version: 1.24.1 (git-v1.24.1)
2. What Kubernetes version are you running?
Starting with version 1.19.9, upgrading to version 1.21.14.
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops replace --filename=cluster.yaml
kops update cluster --yes
5. What happened after the commands executed?
It appears that kops update cluster fails when it panics preparing for publishing OIDC Discovery documents to an S3 bucket:
W0824 08:36:09.215431 12986 external_access.go:39] KubernetesAPIAccess is empty
I0824 08:36:10.848893 12986 executor.go:111] Tasks: 0 done / 393 total; 110 can run
I0824 08:36:11.805410 12986 executor.go:111] Tasks: 110 done / 393 total; 83 can run
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x108 pc=0x3ec3a7c]
goroutine 1531 [running]:
k8s.io/kops/pkg/model.(*OIDCKeys).Open(0xc0017a32a0?)
k8s.io/kops/pkg/model/issuerdiscovery.go:134 +0x21c
k8s.io/kops/upup/pkg/fi.CopyResource({0x5c5c140, 0xc00104de60}, {0x5c61d40?, 0xc00108c7c8?})
k8s.io/kops/upup/pkg/fi/resources.go:85 +0x72
k8s.io/kops/upup/pkg/fi.ResourceAsBytes({0x5c61d40, 0xc00108c7c8})
k8s.io/kops/upup/pkg/fi/resources.go:112 +0x4c
k8s.io/kops/upup/pkg/fi/fitasks.(*ManagedFile).Render(0x5?, 0x0?, 0xc00168b740?, 0xc000ee9240, 0x2?)
k8s.io/kops/upup/pkg/fi/fitasks/managedfile.go:154 +0x70
reflect.Value.call({0x4f99d60?, 0xc000ee9240?, 0x4?}, {0x537191a, 0x4}, {0xc000c28c60, 0x4, 0x5c91ab0?})
reflect/value.go:556 +0x845
reflect.Value.Call({0x4f99d60?, 0xc000ee9240?, 0x53a8fcc?}, {0xc000c28c60, 0x4, 0x4})
reflect/value.go:339 +0xbf
k8s.io/kops/upup/pkg/fi.(*Context).Render(0xc0011ec0a0, {0x5c633a0?, 0x0}, {0x5c633a0?, 0xc000ee9240}, {0x5c633a0?, 0xc00171f440})
k8s.io/kops/upup/pkg/fi/context.go:225 +0xf2e
k8s.io/kops/upup/pkg/fi.DefaultDeltaRunMethod({0x5c633a0?, 0xc000ee9240}, 0xc0011ec0a0)
k8s.io/kops/upup/pkg/fi/default_methods.go:82 +0x46c
k8s.io/kops/upup/pkg/fi/fitasks.(*ManagedFile).Run(0xc0008a2c18?, 0x0?)
k8s.io/kops/upup/pkg/fi/fitasks/managedfile.go:109 +0x26
k8s.io/kops/upup/pkg/fi.(*executor).forkJoin.func1(0xc00146d8f0, 0x4)
k8s.io/kops/upup/pkg/fi/executor.go:187 +0x1ea
created by k8s.io/kops/upup/pkg/fi.(*executor).forkJoin
k8s.io/kops/upup/pkg/fi/executor.go:183 +0x86
It appears to be failing on this line in file pkg/model/issuerdiscovery.go, trying to access a public key in memory.
6. What did you expect to happen?
kops update cluster would publish all the OIDC Discovery documents to S3, and continue on with the rest of its tasks.
7. Please provide your cluster manifest.
cluster.yaml file
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: my-cluster.example.com
spec:
additionalSans:
- api.internal.my-cluster.example.com
api:
loadBalancer:
additionalSecurityGroups:
- sg-005e2b9c6ffed8582
class: Network
crossZoneLoadBalancing: true
type: Public
authorization:
rbac: {}
certManager:
enabled: true
managed: false
cloudConfig:
disableSecurityGroupIngress: true
cloudProvider: aws
clusterAutoscaler:
balanceSimilarNodeGroups: true
enabled: true
configBase: s3://my-kops-state/my-cluster.example.com
etcdClusters:
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: extensive
name: main
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8082
- name: ETCD_METRICS
value: basic
name: events
iam:
allowContainerRegistry: true
legacy: false
kubeAPIServer:
featureGates:
EphemeralContainers: "true"
kubeDNS:
provider: KubeDNS
kubeProxy:
enabled: false
kubelet:
anonymousAuth: false
featureGates:
EphemeralContainers: "true"
kubeReserved:
cpu: 750m
memory: .75Gi
kubernetesVersion: 1.21.14
metricsServer:
enabled: true
networkCIDR: 10.3.0.0/16
networkID: vpc-087cd3eb3bf613986
networking:
calico:
bpfEnabled: true
crossSubnet: true
encapsulationMode: vxlan
typhaReplicas: 3
nonMasqueradeCIDR: 100.64.0.0/10
serviceAccountIssuerDiscovery:
discoveryStore: s3://my-kops-oidc-discovery/my-cluster
enableAWSOIDCProvider: true
sshAccess:
- 184.74.210.37/32
- 184.74.210.38/32
- 207.141.66.101/32
- 207.141.66.99/32
- 212.187.232.28/32
- 212.187.232.29/32
- 4.53.131.109/32
- 4.53.131.110/32
- 4.71.99.125/32
- 4.71.99.126/32
subnets:
- cidr: 10.3.100.0/22
id: subnet-0cd20dfb64345dede
name: utility-us-east-2a
type: Utility
zone: us-east-2a
- cidr: 10.3.104.0/22
id: subnet-0657e2c2163960a79
name: utility-us-east-2b
type: Utility
zone: us-east-2b
- cidr: 10.3.108.0/22
id: subnet-013e44ade2633a1b1
name: utility-us-east-2c
type: Utility
zone: us-east-2c
- cidr: 10.3.0.0/22
egress: nat-06a85bf97c4a5b65d
id: subnet-0ca2f5a3ab50e538e
name: us-east-2a
type: Private
zone: us-east-2a
- cidr: 10.3.4.0/22
egress: nat-054d637847b63ea36
id: subnet-047a72902591ebe60
name: us-east-2b
type: Private
zone: us-east-2b
- cidr: 10.3.8.0/22
egress: nat-0df765ca07bb44f0f
id: subnet-051d2325bcab67fa6
name: us-east-2c
type: Private
zone: us-east-2c
topology:
dns:
type: Public
masters: private
nodes: private
updatePolicy: external
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Here is the kops update cluster output at verbosity level ten, just before the failure:
I0824 08:54:41.457895 13698 executor.go:186] Executing task "MirrorSecrets/mirror-secrets": *fitasks.MirrorSecrets {"Name":"mirror-secrets","Lifecycle":"Sync","MirrorPath":{}}
I0824 08:54:41.461003 13698 request_logger.go:45] AWS request: ec2/DescribeSecurityGroups
I0824 08:54:41.461652 13698 request_logger.go:45] AWS request: iam/GetInstanceProfile
I0824 08:54:41.462143 13698 request_logger.go:45] AWS request: ec2/DescribeSubnets
I0824 08:54:41.462148 13698 request_logger.go:45] AWS request: iam/GetInstanceProfile
I0824 08:54:41.463820 13698 request_logger.go:45] AWS request: iam/ListAttachedRolePolicies
I0824 08:54:41.472058 13698 request_logger.go:45] AWS request: iam/GetRolePolicy
I0824 08:54:41.472136 13698 request_logger.go:45] AWS request: ec2/DescribeSubnets
I0824 08:54:41.472541 13698 s3fs.go:329] Reading file "s3://my-kops-oidc-discovery/my-cluster/openid/v1/jwks"
I0824 08:54:41.472820 13698 request_logger.go:45] AWS request: iam/GetRolePolicy
I0824 08:54:41.473050 13698 request_logger.go:45] AWS request: ec2/DescribeInternetGateways
I0824 08:54:41.473944 13698 request_logger.go:45] AWS request: ec2/DescribeSubnets
I0824 08:54:41.474061 13698 request_logger.go:45] AWS request: ec2/DescribeSecurityGroups
I0824 08:54:41.473806 13698 request_logger.go:45] AWS request: iam/ListAttachedRolePolicies
I0824 08:54:41.473914 13698 request_logger.go:45] AWS request: iam/GetRolePolicy
I0824 08:54:41.510755 13698 request_logger.go:45] AWS request: elasticloadbalancing/DescribeTargetGroups
panic: runtime error: invalid memory address or nil pointer dereference
Earlier, I see this pertinent log message:
I0824 08:54:41.459055 13698 executor.go:186] Executing task "ManagedFile/keys.json": *fitasks.ManagedFile {"Name":"keys.json","Lifecycle":"Sync","Base":"s3://my-kops-oidc-discovery/my-cluster","Location":"openid/v1/jwks","Contents":{"SigningKey":{"Name":"service-account","alternateNames":null,"Lifecycle":"Sync","Signer":null,"subject":"cn=service-account","issuer":"","type":"ca","oldFormat":false}},"Public":true}
Note that at present, the aforementioned S3 bucket exists, but there is no existing object with the path my-cluster/openid/v1/jwks.
9. Anything else do we need to know?
I have been able to upgrade clusters and activate the "spec.serviceAccountIssuerDiscovery.enableAWSOIDCProvider" field's behavior successfully with earlier versions of kOps, which wrote the S3 object as necessary. This version of kOps appears to be failing before it can create this S3 object. kOps was able to create the my-cluster/.well-known/openid-configuration object in the same bucket.
See #13353 for what looks to be an earlier report of a similar defect.
See the prior discussion in the "kops-users" channel of the "Kubernetes" Slack workspace.
/kind bug
It turns out that the KeysetItem.Certificate field is nil in all but the last two items in my key set. I added some output to (*OIDCKeys).Open. It reports the following:
Number of keys in key set: 7
Key set item "6702426753028327577194087677": &{6702426753028327577194087677 <nil> <nil> 0xc001200b50}
(ID: "6702426753028327577194087677", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd92c0})
Key set item "6717351783746805535929340772": &{6717351783746805535929340772 <nil> <nil> 0xc001200b90}
(ID: "6717351783746805535929340772", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd9440})
Key set item "6724755564554290971271764485": &{6724755564554290971271764485 <nil> <nil> 0xc001200bd0}
(ID: "6724755564554290971271764485", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd9500})
Key set item "6725145319226802661715703465": &{6725145319226802661715703465 <nil> <nil> 0xc001200c10}
(ID: "6725145319226802661715703465", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd95c0})
Key set item "6727272810098431180443208693": &{6727272810098431180443208693 <nil> <nil> 0xc001200c50}
(ID: "6727272810098431180443208693", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc001508180})
Key set item "6727329898571771312485446625": &{6727329898571771312485446625 <nil> 0xc000afc000 0xc001200d00}
(ID: "6727329898571771312485446625", distrust timestamp <nil>, certificate: &{CN=kubernetes-master false 0xc00206c580 0xc001200cb0}, private key: &{0xc0015084e0})
Key set item "6906097667750333366645304518": &{6906097667750333366645304518 <nil> 0xc000afc120 0xc001200e90}
(ID: "6906097667750333366645304518", distrust timestamp <nil>, certificate: &{CN=service-account true 0xc00206cb00 0xc001200e20}, private key: &{0xc001508660})
If I add the following guard condition to (*OIDCKeys).Open, it looks like it will filter the key set items down to just those that contain a certificate for the common name "service-account":
if item.Certificate == nil || item.Certificate.Subject.CommonName != "service-account" {
continue
}
Does that preserve all the items that this method was expecting to consume?
Note that the kops get keypairs subcommand fails similarly, due to assuming that every key set item contains an X.509 certificate.
% kops get keypairs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x100 pc=0x42a9788]
goroutine 1 [running]:
main.listKeypairs({0x5c80b38?, 0xc00066d620?}, {0x7dfba58, 0x0, 0x25?}, 0x0)
k8s.io/kops/cmd/kops/get_keypairs.go:127 +0x2e8
main.RunGetKeypairs({0x5c7d260, 0xc0000520e8}, {0x5c61a80?, 0xc000c09080?}, {0x5c639c0?, 0xc00000e018?}, 0xc0008b6270)
k8s.io/kops/cmd/kops/get_keypairs.go:174 +0xf8
main.NewCmdGetKeypairs.func3(0xc000e65680?, {0x7dfba58?, 0x0?, 0x0?})
k8s.io/kops/cmd/kops/get_keypairs.go:78 +0x3e
github.com/spf13/cobra.(*Command).execute(0xc000e65680, {0x7dfba58, 0x0, 0x0})
github.com/spf13/[email protected]/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0x7da7c00)
github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/[email protected]/command.go:918
main.Execute()
k8s.io/kops/cmd/kops/root.go:95 +0x5c
main.main()
k8s.io/kops/cmd/kops/main.go:20 +0x17
@seh Would you like to continue to iterate on the fix?
Would you like to continue to iterate on the fix?
Yes, though it would help to hear whether or not these entries that lack certificates are valid. Can kOps use them for anything? Should I ignore them as if they were distrusted?
Ignore them as distrusted, but list them and make them deletable, I would say.
I guess I didn't research far back enough in the history of the keystore code.
kOps can't use a private key without a certificate for anything unless/until it generates a corresponding certificate. (Though for service-account keypairs the only part of the certificate it uses is the public key.)
These days all code paths that create a key also create a corresponding certificate. I would agree that keys without certificates should be ignored as if distrusted.
As this is not a regression or something that breaks things for a lot of users, I removed the blocks-next label.