kops icon indicating copy to clipboard operation
kops copied to clipboard

kops version 1.30.4 still has issue https://github.com/kubernetes/kops/pull/17161

Open vijaymailb opened this issue 11 months ago • 3 comments

/kind bug

**1. 1.30.4

**2. 1.30.8

**3. AWS

**4. kops update cluster --yes --lifecycle-overrides IAMRole=ExistsAndWarnIfChanges,IAMRolePolicy=ExistsAndWarnIfChanges,IAMInstanceProfileRole=ExistsAndWarnIfChanges

5. What happened after the commands executed? SDK 2025/01/31 08:40:37 DEBUG request failed with unretryable error https response error StatusCode: 403, RequestID: 88d417e4-f17d-4401-ad21-86965447eb75, api error AccessDenied: User: arn:aws:sts:::assumed-role/kops-admin.test.io/aws-go-sdk-1738312837464117679 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam:::role/kops-admin.test.io Error: error determining default DNS zone: error querying zones: error listing hosted zones: operation error Route 53: ListHostedZones, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 88d417e4-f17d-4401-ad21-86965447eb75, api error AccessDenied: User: arn:aws:sts:::assumed-role/kops-admin.test.io/aws-go-sdk-1738312837464117679 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam:::role/kops-admin.test.io

**6. above issue should be resolved with kops 1.30.4

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

kind: Cluster
metadata:
  creationTimestamp: null
spec:
  additionalSans:
  api:
    loadBalancer:
      additionalSecurityGroups:
      class: Network
      type: Internal
  assets:
  authentication:
    aws:
      backendMode: CRD
  authorization:
    rbac: {}
  certManager:
    enabled: true
  channel: stable
  cloudConfig:
    awsEBSCSIDriver:
      enabled: true
  cloudControllerManager:
    cloudProvider: aws
    image: repo_name/provider-aws/cloud-controller-manager:v1.29.6
  cloudLabels:
    Application: kubernetes
    Product: kubernetes
    Terraform: "true"
    department: ips
    environment: dev
    stage: feature
  cloudProvider: aws
  containerd:
    logLevel: warn
    nvidiaGPU:
      enabled: true
    runc:
      version: 1.1.12
    version: 1.7.16
  etcdClusters:
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-central-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-central-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-central-1c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: extensive
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
        value: 7d
      - name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
        value: 14d
      logLevel: 1
    name: main
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-central-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-central-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-central-1c
      name: c
    manager:
      logLevel: 1
    name: events
  fileAssets:
  - content: "apiVersion: audit.k8s.io/v1 # This is required.\nkind: Policy\n# Don't
      generate audit events for all requests in RequestReceived stage.\nomitStages:\n
      \ - \"RequestReceived\"\nrules:\n  # Log pod changes at RequestResponse level\n
      \ - level: RequestResponse\n    verbs: [\"create\", \"patch\", \"update\", \"delete\",
      \"deletecollection\"]\n    resources:\n    - group: \"\"\n      # Resource \"pods\"
      doesn't match requests to any subresource of pods,\n      # which is consistent
      with the RBAC policy.\n      resources: [\"pods\"]\n\n  # Don't log requests
      to a configmap called \"controller-leader\"\n  - level: None\n    resources:\n
      \   - group: \"\"\n      resources: [\"configmaps\"]\n      resourceNames: [\"controller-leader\"]\n
      \ \n  # Log configmap and secret changes at the Metadata level.\n  - level:
      Metadata\n    resources:\n    - group: \"\" # core API group\n      resources:
      [\"secrets\", \"configmaps\"]\n\n  # Do not log from kube-system and nodes accounts\n
      \ - level: None\n    userGroups:\n    - system:serviceaccounts:kube-system\n
      \   - system:nodes\n\n  # Do not log from some system users\n  - level: None\n
      \   users:\n    - system:apiserver\n    - system:kube-proxy\n    - system:kube-scheduler\n
      \   - system:kube-controller-manager\n    - system:node\n    - system:serviceaccount:core-stack:cluster-autoscaler-chart-aws-cluster-autoscaler\n
      \   - system:serviceaccount:core-stack:external-dns\n    - system:serviceaccount:istio-system:istiod-service-account\n
      \   - system:volume-scheduler\n\n  # Don't log these read-only URLs.\n  - level:
      None\n    nonResourceURLs:\n    - \"/healthz*\"\n    - \"/version\"\n    - \"/swagger*\"\n
      \   - \"/logs\"\n    - \"/metrics\"\n\n  # Don't log authenticated requests
      to certain non-resource URL paths.\n  - level: None\n    userGroups: [\"system:authenticated\"]\n
      \   nonResourceURLs:\n    - \"/api*\" # Wildcard matching.\n    - \"/version\"\n\n
      \ # Log on Metadata from  gatekeeper accounts\n  - level: Metadata\n    userGroups:\n
      \   - system:serviceaccounts:gatekeeper-system\n\n  # Log All changes at RequestResponse
      level\n  - level: RequestResponse\n    verbs: [\"create\", \"patch\", \"update\",
      \"delete\", \"deletecollection\"]\n\n  # Log all other resources in core and
      extensions at the Request level.\n  - level: Request\n    resources:\n    -
      group: \"\" # core API group\n    - group: \"extensions\" # Version of group
      should NOT be included.\n\n  # A catch-all rule to log all other requests at
      the Metadata level.\n  - level: Metadata\n    # Long-running requests like watches
      that fall under this rule will not\n    # generate an audit event in RequestReceived.\n
      \   omitStages:\n      - \"RequestReceived\"\n"
    name: audit-policy-config
    path: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
    roles:
    - ControlPlane
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: false
  kubeAPIServer:
    auditLogMaxAge: 10
    auditLogMaxBackups: 1
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
    cloudProvider: external
    oidcClientID: kubernetes
    oidcGroupsClaim: groups
  kubeDNS:
    nodeLocalDNS:
      enabled: true
      forwardToKubeDNS: true
    provider: CoreDNS
    tolerations:
    - effect: NoSchedule
      key: component
      operator: Equal
      value: core
    - key: CriticalAddonsOnly
      operator: Exists
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesVersion: 1.30.8
  networkCIDR: 10.151.24.0/21
  networkID: vpc-099f946314a45d38c
  networking:
    cni: {}
  nodeTerminationHandler:
    cpuRequest: 10m
    enableRebalanceDraining: false
    enableRebalanceMonitoring: false
    enableSQSTerminationDraining: false
    enabled: true
    prometheusEnable: true
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://oidc-f0122fa2d-987360431102
    enableAWSOIDCProvider: true
  topology:
    dns:
      type: Private

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

vijaymailb avatar Feb 03 '25 08:02 vijaymailb

Same

aramhakobyan avatar Feb 03 '25 09:02 aramhakobyan

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 04 '25 10:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 03 '25 10:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jul 03 '25 11:07 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 03 '25 11:07 k8s-ci-robot

FWIW, when using IRSA and kops 1.31.1, I get an error where kops is trying to assume the role configured into AWS environment variables again. This was supported by default until 2023, and can be supported by adjusting the role policy, but it would be nice not to have to.

akloss-cibo avatar Oct 07 '25 18:10 akloss-cibo