kOps skips "public-read" ACL on OIDC-related S3 objects
1. What kops version are you running? The command kops version, will display
this information.
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
- Create a cluster.yaml file with a Cluster and a few InstanceGroup manifests.
Include a few crucial fields in the Cluster spec, setting "spec.serviceAccountIssuerDiscovery.enableAWSOIDCProvider" totrueand "spec.serviceAccountIssuerDiscovery.discoveryStore" an S3 URL pointing at an object prefix within an existing S3 bucket. - kops create --filename=cluster.yaml
- kops replace --filename=cluster.yaml --force
- kops update cluster --name="${cluster_name}" --admin --yes
5. What happened after the commands executed?
kOps creates the cluster, the control plane machines, and a few worker machines, but none of the pods' containers that attempt to authenticate to AWS via OIDC succeed. This is because the S3 objects for both the OIDC discovery document and the JWKS are not readable publicly.
6. What did you expect to happen?
kOps would create the two aforementioned S3 objects and set the "public-read" ACL on them, which it used to do as of kOps version 1.25.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
cluster.yaml
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: mycluster.example.comt
spec:
additionalSans:
- api.mycluster.example.comt
- api.internal.mycluster.example.comt
api:
loadBalancer:
additionalSecurityGroups:
- sg-04bfa48a96656906e
class: Network
crossZoneLoadBalancing: true
type: Public
authorization:
rbac: {}
awsLoadBalancerController:
enableWAF: true
enableWAFv2: true
enabled: true
certManager:
enabled: true
managed: true
cloudConfig:
disableSecurityGroupIngress: true
manageStorageClasses: false
cloudProvider: aws
clusterAutoscaler:
balanceSimilarNodeGroups: true
enabled: true
configBase: s3://my-bucket/mycluster.example.comt
etcdClusters:
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: extensive
name: main
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8082
- name: ETCD_METRICS
value: basic
name: events
iam:
allowContainerRegistry: true
legacy: false
useServiceAccountExternalPermissions: true
kubeAPIServer:
featureGates:
StatefulSetAutoDeletePVC: "true"
kubeControllerManager:
featureGates:
StatefulSetAutoDeletePVC: "true"
kubeProxy:
enabled: false
kubelet:
anonymousAuth: false
kubeReserved:
cpu: 750m
memory: .75Gi
kubernetesVersion: 1.26.3
metricsServer:
enabled: true
networkCIDR: 10.3.0.0/16
networkID: vpc-0b963d861ceaf3b17
networking:
calico:
bpfEnabled: true
crossSubnet: true
encapsulationMode: vxlan
typhaReplicas: 3
nonMasqueradeCIDR: 100.64.0.0/10
podIdentityWebhook:
enabled: true
serviceAccountIssuerDiscovery:
discoveryStore: s3://my-bucket/mycluster
enableAWSOIDCProvider: true
subnets:
- cidr: 10.3.100.0/22
id: subnet-0c828450b78705439
name: utility-us-east-2a
type: Utility
zone: us-east-2a
- cidr: 10.3.104.0/22
id: subnet-0398d4a75a3888c0c
name: utility-us-east-2b
type: Utility
zone: us-east-2b
- cidr: 10.3.108.0/22
id: subnet-058646b4f6fbd9929
name: utility-us-east-2c
type: Utility
zone: us-east-2c
- cidr: 10.3.0.0/22
egress: nat-094e8b4023a0f8093
id: subnet-08a9f946eb2814ee4
name: us-east-2a
type: Private
zone: us-east-2a
- cidr: 10.3.4.0/22
egress: nat-026c73e6779288dc7
id: subnet-0442a24d37181e2a3
name: us-east-2b
type: Private
zone: us-east-2b
- cidr: 10.3.8.0/22
egress: nat-0e034de3e76de6d12
id: subnet-03617a6eec6a5fbed
name: us-east-2c
type: Private
zone: us-east-2c
topology:
dns:
type: Public
masters: private
nodes: private
8. Anything else do we need to know?
Per preceding discussion in the "kops-users" channel of the "Kubernetes" Slack workspace, this behavior changed in #14788. When kOps checks if my S3 bucket is public, it concludes that it is, and thus skips attaching the "public-read" ACL to the objects that it creates.
The AWS Web console claims that my S3 bucket is "public" or "publicly accessible," but it has an IAM policy that grants the "s3:GetObject" action for the two OIDC-related objects (jwks and openid-configuration) to all principals, and it also has one of the four options enabled under "Individual Block Public Access settings for this bucket": "Block public access to buckets and objects granted through new public bucket or access point policies."
Without the "public-read" ACL set on these two S3 objects, attempting to fetch them over HTTPS provokes responses with status code 403, reporting that access is denied. Setting that "public-read" ACL manually on these objects fixes that problem, and allows AWS's STS to read these documents when authenticating Kubernetes service account tokens.
Is my S3 bucket configured incorrectly? My goal is to allow public access to only the two OIDC-related objects for each Kubernetes cluster's IAM identity provider, and to regulate carefully which principals can write these objects.
/kind bug
After much study and experimentation, I learned enough to bypass this problem.
In my S3 bucket, I had the Object Ownership set to "Bucket Owner Preferred," which enables ACLs on the bucket. I also have an IAM policy attached that includes a statement granting the "s3:GetObject" action to the "*" principal, which causes AWS to consider the bucket as "public," since I did not have any of the "Block Public Access" settings enabled that would preclude that IAM policy statement from taking effect.
For an S3 bucket like this with ACLs enabled that also qualifies as "public," kOps does the wrong thing now by skipping creation of the per-object ACL. However, AWS recommends against using ACLs on buckets today. Since our IAM policy already enforces the permissions we intended to grant, I changed our bucket's Object Ownership to "Bucket Owner Enforced." That removes all use of ACLs, which no longer requires adding the "public-read" per-object ACL to the objects that kOps creates. With that configuration, when kOps decides against adding the per-object ACL, we still wind up with everyone being to read the S3 objects. Doing nothing produces the right result.
The question, then, is whether kOps should try harder to detect whether an S3 bucket that's considered "public" still uses ACLs. If the bucket does still use ACLs, then adding the per-object ACLs would still be necessary.
We could consider augmenting our predicate for whether to apply a per-object "public-read" ACL here. As it stands, we look only at the response from GetBucketPolicyStatus, and if its "IsPubilc" response field is true, we skip applying an ACL. We could the also confirm via GetBucketOwnershipControls that no ACLs are in use. If its response contains a rule with an ObjectOwnership value of "BucketOwnerPreferred" or "ObjectWriter"—and not "BucketOwnerEnforced"—then we'd still need to apply a per-object ACL.
If we did that, we'd need to grant the "s3:GetBucketOwnershipControls" IAM action to principals that run the kops update cluster command.
Office hours:
- In favor of adding call to
GetBucketOwnershipControls - Will look into whether dryrun could be made to output manifest diffs
- Considering tying addon upgrades to k8s versions
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/help
@johngmyers: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle stale