kops icon indicating copy to clipboard operation
kops copied to clipboard

kOps skips "public-read" ACL on OIDC-related S3 objects

Open seh opened this issue 2 years ago • 7 comments

1. What kops version are you running? The command kops version, will display this information.

1.26.2

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

1.26.3

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

  1. Create a cluster.yaml file with a Cluster and a few InstanceGroup manifests.
    Include a few crucial fields in the Cluster spec, setting "spec.serviceAccountIssuerDiscovery.enableAWSOIDCProvider" to true and "spec.serviceAccountIssuerDiscovery.discoveryStore" an S3 URL pointing at an object prefix within an existing S3 bucket.
  2. kops create --filename=cluster.yaml
  3. kops replace --filename=cluster.yaml --force
  4. kops update cluster --name="${cluster_name}" --admin --yes

5. What happened after the commands executed?

kOps creates the cluster, the control plane machines, and a few worker machines, but none of the pods' containers that attempt to authenticate to AWS via OIDC succeed. This is because the S3 objects for both the OIDC discovery document and the JWKS are not readable publicly.

6. What did you expect to happen?

kOps would create the two aforementioned S3 objects and set the "public-read" ACL on them, which it used to do as of kOps version 1.25.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

cluster.yaml
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: mycluster.example.comt
spec:
  additionalSans:
  - api.mycluster.example.comt
  - api.internal.mycluster.example.comt
  api:
    loadBalancer:
      additionalSecurityGroups:
      - sg-04bfa48a96656906e
      class: Network
      crossZoneLoadBalancing: true
      type: Public
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enableWAF: true
    enableWAFv2: true
    enabled: true
  certManager:
    enabled: true
    managed: true
  cloudConfig:
    disableSecurityGroupIngress: true
    manageStorageClasses: false
  cloudProvider: aws
  clusterAutoscaler:
    balanceSimilarNodeGroups: true
    enabled: true
  configBase: s3://my-bucket/mycluster.example.comt
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: extensive
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8082
      - name: ETCD_METRICS
        value: basic
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: true
  kubeAPIServer:
    featureGates:
      StatefulSetAutoDeletePVC: "true"
  kubeControllerManager:
    featureGates:
      StatefulSetAutoDeletePVC: "true"
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
    kubeReserved:
      cpu: 750m
      memory: .75Gi
  kubernetesVersion: 1.26.3
  metricsServer:
    enabled: true
  networkCIDR: 10.3.0.0/16
  networkID: vpc-0b963d861ceaf3b17
  networking:
    calico:
      bpfEnabled: true
      crossSubnet: true
      encapsulationMode: vxlan
      typhaReplicas: 3
  nonMasqueradeCIDR: 100.64.0.0/10
  podIdentityWebhook:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://my-bucket/mycluster
    enableAWSOIDCProvider: true
  subnets:
  - cidr: 10.3.100.0/22
    id: subnet-0c828450b78705439
    name: utility-us-east-2a
    type: Utility
    zone: us-east-2a
  - cidr: 10.3.104.0/22
    id: subnet-0398d4a75a3888c0c
    name: utility-us-east-2b
    type: Utility
    zone: us-east-2b
  - cidr: 10.3.108.0/22
    id: subnet-058646b4f6fbd9929
    name: utility-us-east-2c
    type: Utility
    zone: us-east-2c
  - cidr: 10.3.0.0/22
    egress: nat-094e8b4023a0f8093
    id: subnet-08a9f946eb2814ee4
    name: us-east-2a
    type: Private
    zone: us-east-2a
  - cidr: 10.3.4.0/22
    egress: nat-026c73e6779288dc7
    id: subnet-0442a24d37181e2a3
    name: us-east-2b
    type: Private
    zone: us-east-2b
  - cidr: 10.3.8.0/22
    egress: nat-0e034de3e76de6d12
    id: subnet-03617a6eec6a5fbed
    name: us-east-2c
    type: Private
    zone: us-east-2c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

8. Anything else do we need to know?

Per preceding discussion in the "kops-users" channel of the "Kubernetes" Slack workspace, this behavior changed in #14788. When kOps checks if my S3 bucket is public, it concludes that it is, and thus skips attaching the "public-read" ACL to the objects that it creates.

The AWS Web console claims that my S3 bucket is "public" or "publicly accessible," but it has an IAM policy that grants the "s3:GetObject" action for the two OIDC-related objects (jwks and openid-configuration) to all principals, and it also has one of the four options enabled under "Individual Block Public Access settings for this bucket": "Block public access to buckets and objects granted through new public bucket or access point policies."

Without the "public-read" ACL set on these two S3 objects, attempting to fetch them over HTTPS provokes responses with status code 403, reporting that access is denied. Setting that "public-read" ACL manually on these objects fixes that problem, and allows AWS's STS to read these documents when authenticating Kubernetes service account tokens.

Is my S3 bucket configured incorrectly? My goal is to allow public access to only the two OIDC-related objects for each Kubernetes cluster's IAM identity provider, and to regulate carefully which principals can write these objects.

/kind bug

seh avatar Apr 05 '23 18:04 seh

After much study and experimentation, I learned enough to bypass this problem.

In my S3 bucket, I had the Object Ownership set to "Bucket Owner Preferred," which enables ACLs on the bucket. I also have an IAM policy attached that includes a statement granting the "s3:GetObject" action to the "*" principal, which causes AWS to consider the bucket as "public," since I did not have any of the "Block Public Access" settings enabled that would preclude that IAM policy statement from taking effect.

For an S3 bucket like this with ACLs enabled that also qualifies as "public," kOps does the wrong thing now by skipping creation of the per-object ACL. However, AWS recommends against using ACLs on buckets today. Since our IAM policy already enforces the permissions we intended to grant, I changed our bucket's Object Ownership to "Bucket Owner Enforced." That removes all use of ACLs, which no longer requires adding the "public-read" per-object ACL to the objects that kOps creates. With that configuration, when kOps decides against adding the per-object ACL, we still wind up with everyone being to read the S3 objects. Doing nothing produces the right result.

The question, then, is whether kOps should try harder to detect whether an S3 bucket that's considered "public" still uses ACLs. If the bucket does still use ACLs, then adding the per-object ACLs would still be necessary.

seh avatar Apr 06 '23 13:04 seh

We could consider augmenting our predicate for whether to apply a per-object "public-read" ACL here. As it stands, we look only at the response from GetBucketPolicyStatus, and if its "IsPubilc" response field is true, we skip applying an ACL. We could the also confirm via GetBucketOwnershipControls that no ACLs are in use. If its response contains a rule with an ObjectOwnership value of "BucketOwnerPreferred" or "ObjectWriter"—and not "BucketOwnerEnforced"—then we'd still need to apply a per-object ACL.

If we did that, we'd need to grant the "s3:GetBucketOwnershipControls" IAM action to principals that run the kops update cluster command.

seh avatar Apr 07 '23 13:04 seh

Office hours:

  • In favor of adding call to GetBucketOwnershipControls
  • Will look into whether dryrun could be made to output manifest diffs
  • Considering tying addon upgrades to k8s versions

johngmyers avatar Apr 07 '23 16:04 johngmyers

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 06 '23 16:07 k8s-triage-robot

/help

johngmyers avatar Jul 07 '23 00:07 johngmyers

@johngmyers: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 07 '23 00:07 k8s-ci-robot

/remove-lifecycle stale

vaibhav2107 avatar Aug 03 '23 13:08 vaibhav2107