kops icon indicating copy to clipboard operation
kops copied to clipboard

kops export --internal gives error on 1.26.3

Open SCLogo opened this issue 2 years ago • 31 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information. 1.26.3 2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. 1.26.3 3. What cloud provider are you using? aws 4. What commands did you run? What is the simplest way to reproduce this issue? kops export kubecfg --state --admin --internal k get pods 5. What happened after the commands executed?

E0612 11:43:20.331105   19527 memcache.go:265] couldn't get current server API group list: Get "https://api.internal.testcluster-api.control.example.com:8443/api?timeout=32s": dial tcp 192.168.0.98:8443: connect: connection refused

6. What did you expect to happen? able to get pods on cluster 7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2023-06-05T14:54:56Z"
  name: testcluster-api.control.example.com
spec:
  api:
    loadBalancer:
      class: Network
      sslCertificate: arn:aws:acm:eu-east-6:12345678:certificate/46456456456-erg45t54-45645g45-343
      type: Internal
  authorization:
    rbac: {}
  channel: stable
  cloudLabels:
    Owner: testcluster
    kops: testcluster.control.example.com
  cloudProvider: aws
  configBase: s3://testcluster/testcluster.control.example.com
  containerRuntime: containerd
  encryptionConfig: true
  iam:
    legacy: false
  kubeControllerManager:
    enableProfiling: false
    featureGates:
      RotateKubeletServerCertificate: "true"
    terminatedPodGCThreshold: 10
  kubeScheduler:
    enableProfiling: false
  kubernetesVersion: v1.26.3
  masterPublicName: testcluster-api.control.example.com

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know? we change the masterPublicName in config

SCLogo avatar Jun 12 '23 09:06 SCLogo

Please explain the problem in detail:

  • Which is the version of kOps it last worked with?
  • What is the difference between the kubeconfig generated by the working version versus the one generated by 1.26.3?
  • What is IP 192.168.0.98?
  • How are you trying to connect to 192.168.0.98 (internal network)?
  • ....

hakman avatar Jun 12 '23 11:06 hakman

last worked version was 1.25.4 Ip is just fake ip Cluster was created with 1.26.3 binary and then the config was exported with 1.25.4. binary. In that case it worked, so in config there were no changes.

When I tried to export with 1.26.3 I got this error

SCLogo avatar Jun 13 '23 11:06 SCLogo

Cluster was created with 1.26.3 binary and then the config was exported with 1.25.4. binary. In that case it worked, so in config there were no changes.

What is the difference in ~/.kube/config when exported with 1.25.4 vs 1.26.3?

hakman avatar Jun 13 '23 12:06 hakman

the only difference is :

on 1.25.4 the config contains:

clusters:
- cluster:
      server: https://internal.cluster-api.control.example.com:8443

on 1.26.3

clusters:
- cluster:
      server: https://api.internal.cluster-api.control.example.com8443

SCLogo avatar Jun 13 '23 13:06 SCLogo

Where did https://internal.cluster-api.control.example.com:8443 come from? Do you also change MasterInternalName somewhere?

hakman avatar Jun 13 '23 15:06 hakman

In config we set the masterInternal and masterPublic, but when I do the kops update cluster I don't see the internal one on the final config.

SCLogo avatar Jun 14 '23 08:06 SCLogo

I still don't understand where you get the https://internal.cluster-api.control.example.com:8443. The internal address of the API server is api.internal.<CLUSTER_NAME>.

hakman avatar Jun 14 '23 08:06 hakman

with ansible we set the following lines after we created the cluster config with kops create cluster

masterPublicName: https://cluster-api.control.example.com:8443
masterInternalName: https://internal.cluster-api.control.example.com:8443

after that a kops update cluster set all settings

SCLogo avatar Jun 14 '23 11:06 SCLogo

As of 1.26, changing masterInternalName is no longer allowed. kops export kubecfg --admin --internal will always set server: https://api.internal.cluster-api.control.example.com.

Please try to use additionalSANs instead, if that's possible with your workflow. https://pkg.go.dev/k8s.io/kops/pkg/apis/kops#APISpec

hakman avatar Jun 14 '23 12:06 hakman

last time I tried that one but had a problem with export... maybe on that time we could not export using sans address only ?

is that possible? our real goal is to have api access w/o LB only using nodeports

SCLogo avatar Jun 14 '23 14:06 SCLogo

kOps will only set and update set the DNS record for api.internal.cluster-api.control.example.com. It is a subdomain of what you are already using, so should work just fine. Not sure if there's other logic in play though. https://github.com/kubernetes/kops/blob/feedb1b2bbc07ed151a1ac6f0372cbb36358ea65/nodeup/pkg/model/kube_apiserver.go#L759-L761

hakman avatar Jun 14 '23 15:06 hakman

api subdomain would be ok for us as well, but as I see in export it exports the dns of the LB and not the api.internal.cluster-api.control.example.com with port 8443. Any idea?

SCLogo avatar Jun 21 '23 05:06 SCLogo

Do you also set UseForInternalAPI somewhere?

hakman avatar Jun 21 '23 06:06 hakman

no. probably that is the missing part ?

SCLogo avatar Jun 21 '23 06:06 SCLogo

Please paste the cluster spec with all the fields. I don't think what's in the main comment is full. For example topology is missing. The create cluster command would also be useful.

hakman avatar Jun 21 '23 06:06 hakman

config:

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2023-06-21T08:22:34Z"
  name: testcluster-api.control.example.com
spec:
  rollingUpdate:
    maxSurge: 50%
  kubeControllerManager:
    enableProfiling: false
    terminatedPodGCThreshold: 10
    featureGates:
      RotateKubeletServerCertificate: "true"
  kubeScheduler:
    enableProfiling: false
  kubeAPIServer:
    enableProfiling: false
    auditLogMaxAge: 30
    auditLogMaxBackups: 10
    auditLogMaxSize: 100
    requestTimeout: 3m0s
    auditLogPath: /var/log/apiserver/audit.log
    auditPolicyFile: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
    admissionControlConfigFile: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
    appendAdmissionPlugins:
    - EventRateLimit
  masterPublicName: testcluster-api.control.example.com
  masterInternalName: internal.testcluster-api.control.example.com
  assets:
    containerProxy: "external-docker-registry"
  additionalPolicies:
    master: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "kms:Decrypt",
            "kms:Encrypt"
          ],
          "Resource": [
            "aws_arn_key"
          ]
        }
      ]

  fileAssets:
  - name: aws-encryption-provider.yaml
    path: /etc/kubernetes/manifests/aws-encryption-provider.yaml
    roles:
    - ControlPlane
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          scheduler.alpha.kubernetes.io/critical-pod: ""
        labels:
          k8s-app: aws-encryption-provider
        name: aws-encryption-provider
        namespace: kube-system
      spec:
        # since runs on a master node, pull-secret needs to be available immediately.
        # we're using the kops create secret dockerconfig. alternatively, we could use
        # a public image
        containers:
        - image: external-docker-registry/aws-encryption-provider:tag
          name: aws-encryption-provider
          command:
          - /aws-encryption-provider
          - --key=aws_arn_key
          - --region=aws_region
          - --listen=/srv/kubernetes/kube-apiserver/socket.sock
          - --health-port=:8083
          ports:
          - containerPort: 8083
            protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8083
          volumeMounts:
          - mountPath: /srv/kubernetes/kube-apiserver
            name: kmsplugin
        hostNetwork: true
        priorityClassName: system-cluster-critical

        # we're using this volume for other files too
        volumes:
        - name: kmsplugin
          hostPath:
            path: /srv/kubernetes/kube-apiserver
            type: DirectoryOrCreate
  - content: |
      apiVersion: eventratelimit.admission.k8s.io/v1alpha1
      kind: Configuration
      limits:
      - type: Namespace
        qps: 50
        burst: 100
        cacheSize: 2000
      - type: User
        qps: 10
        burst: 50
    name: event-rate-configuration
    path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
    roles:
    - ControlPlane
  - content: |
      apiVersion: apiserver.config.k8s.io/v1
      kind: AdmissionConfiguration
      plugins:
      - name: EventRateLimit
        path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
    name: admission-configuration
    path: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
    roles:
    - ControlPlane

  - name: audit-policy-config
    path: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
    roles:
    - ControlPlane
    content: |
      apiVersion: audit.k8s.io/v1 # This is required.
      kind: Policy
      # Don't generate audit events for all requests in RequestReceived stage.
      omitStages:
        - "RequestReceived"
      rules:
        # Log pod changes at RequestResponse level
        - level: RequestResponse
          resources:
          - group: ""
            # Resource "pods" doesn't match requests to any subresource of pods,
            # which is consistent with the RBAC policy.
            resources: ["pods", "deployments"]

        - level: RequestResponse
          resources:
          - group: "rbac.authorization.k8s.io"
            resources: ["clusterroles", "clusterrolebindings"]

        # Log "pods/log", "pods/status" at Metadata level
        - level: Metadata
          resources:
          - group: ""
            resources: ["pods/log", "pods/status"]

        # Don't log requests to a configmap called "controller-leader"
        - level: None
          resources:
          - group: ""
            resources: ["configmaps"]
            resourceNames: ["controller-leader"]
        # Don't log watch requests by the "system:kube-proxy" on endpoints or services
        - level: None
          users: ["system:kube-proxy"]
          verbs: ["watch"]
          resources:
          - group: "" # core API group
            resources: ["endpoints", "services"]

        # Don't log authenticated requests to certain non-resource URL paths.
        - level: None
          userGroups: ["system:authenticated"]
          nonResourceURLs:
          - "/api*" # Wildcard matching.
          - "/version"

        # Log the request body of configmap changes in kube-system.
        - level: Request
          resources:
          - group: "" # core API group
            resources: ["configmaps"]
          # This rule only applies to resources in the "kube-system" namespace.
          # The empty string "" can be used to select non-namespaced resources.
          namespaces: ["kube-system"]

        # Log configmap changes in all other namespaces at the RequestResponse level.
        - level: RequestResponse
          resources:
          - group: "" # core API group
            resources: ["configmaps"]

        # Log secret changes in all namespaces at the Metadata level.
        - level: Metadata
          resources:
          - group: "" # core API group
            resources: ["secrets"]

        # Log all other resources in core and extensions at the Request level.
        - level: Request
          resources:
          - group: "" # core API group
          - group: "extensions" # Version of group should NOT be included.

        # A catch-all rule to log all other requests at the Metadata level.
        - level: Metadata
          # Long-running requests like watches that fall under this rule will not
          # generate an audit event in RequestReceived.
          omitStages:
            - "RequestReceived"
  encryptionConfig: true
  api:
    loadBalancer:
      class: Network
      sslCertificate: aws_ssl_cert
      type: Internal
      UseForInternalAPI: true
  authorization:
    rbac: {}
  channel: stable
  cloudLabels:
    Owner: company
    delete: ""
    kops: testcluster-api.control.example.com
  cloudProvider: aws
  configBase: s3://config/testcluster-api.control.example.com
  containerRuntime: containerd
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      kmsKeyId: aws_kms_key_id
      instanceGroup: control-plane-aws_region
      name: a
    memoryRequest: 100Mi
    manager:
      env:
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
      value: 60d
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      kmsKeyId: aws_kms_key_id
      instanceGroup: control-plane-aws_region
      name: a
    memoryRequest: 100Mi
    manager:
      env:
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
      value: 60d
    name: events
  iam:
    legacy: false
  kubelet:
    anonymousAuth: false
    evictionMaxPodGracePeriod: 120
    evictionSoftGracePeriod: memory.available=30s,imagefs.available=30s,nodefs.available=30s,imagefs.inodesFree=30s,nodefs.inodesFree=30s
    evictionSoft: memory.available<250Mi,nodefs.available<15%,nodefs.inodesFree<10%,imagefs.available<20%,imagefs.inodesFree<10%
    evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<15%,imagefs.inodesFree<5%
    readOnlyPort: 0
    eventQPS: 0
    featureGates: { RotateKubeletServerCertificate: "true" }
    kernelMemcgNotification: true
    protectKernelDefaults: true
    authorizationMode: Webhook
    authenticationTokenWebhook: true
  kubernetesApiAccess:
  - 192.168.0.0/24
  kubernetesVersion: v1.26.3
  networkCIDR: 192.168.2/23
  networkID: vpc-34567juytgrfed
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 192.168.0.0/24
  subnets:
  - cidr: 192.168.2.0/26
    id: subnet-040987b99943b6cec
    name: aws_region
    type: Private
    zone: aws_region
  - cidr: 192.168.2.192/26
    id: subnet-091cac4d38a2ee136
    name: utility-aws_region
    type: Utility
    zone: aws_region
  topology:
    dns:
      type: Private
    masters: private
    nodes: private

SCLogo avatar Jun 21 '23 11:06 SCLogo

cluster.spec:


apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2023-06-21T08:22:34Z"
  name: testcluster-api.control.example.com
spec:
  additionalPolicies:
    master: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "kms:Decrypt",
            "kms:Encrypt"
          ],
          "Resource": [
            "aws_arn_key"
          ]
        }
      ]
  api:
    loadBalancer:
      class: Network
      sslCertificate: aws_ssl_cert
      type: Internal
  assets:
    containerProxy: external-docker-registry
  authorization:
    rbac: {}
  channel: stable
  cloudConfig:
    awsEBSCSIDriver:
      enabled: true
      version: v1.14.1
    manageStorageClasses: true
  cloudControllerManager:
    allocateNodeCIDRs: true
    clusterCIDR: 100.64.0.0/10
    clusterName: testcluster-api.control.example.com
    configureCloudRoutes: false
    image: registry.k8s.io/provider-aws/cloud-controller-manager:v1.26.0
    leaderElection:
      leaderElect: true
  cloudLabels:
    Owner: company
    delete: ""
    kops: testcluster-api.control.example.com
  cloudProvider: aws
  clusterDNSDomain: cluster.local
  configBase: s3://config/testcluster-api.control.example.com
  configStore: s3://config/testcluster-api.control.example.com
  containerRuntime: containerd
  containerd:
    logLevel: info
    runc:
      version: 1.1.4
    version: 1.6.18
  dnsZone: aws_dns_zone
  docker:
    skipInstall: true
  encryptionConfig: true
  etcdClusters:
  - backups:
      backupStore: s3://config/testcluster-api.control.example.com/backups/etcd/main
    cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-aws-region
      kmsKeyId: aws_kms_key_id
      name: a
    manager:
      env:
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
    memoryRequest: 100Mi
    name: main
    version: 3.5.7
  - backups:
      backupStore: s3://config/testcluster-api.control.example.com/backups/etcd/events
    cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-aws_region
      kmsKeyId: aws_kms_key_id
      name: a
    manager:
      env:
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
    memoryRequest: 100Mi
    name: events
    version: 3.5.7
  externalDns:
    provider: dns-controller
  fileAssets:
  - content: |
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          scheduler.alpha.kubernetes.io/critical-pod: ""
        labels:
          k8s-app: aws-encryption-provider
        name: aws-encryption-provider
        namespace: kube-system
      spec:
        # since runs on a master node, pull-secret needs to be available immediately.
        # we're using the kops create secret dockerconfig. alternatively, we could use
        # a public image
        containers:
        - image: external-docker_registry/aws-encryption-provider:tag
          name: aws-encryption-provider
          command:
          - /aws-encryption-provider
          - --key=aws_arn_key
          - --region=us-east-2
          - --listen=/srv/kubernetes/kube-apiserver/socket.sock
          - --health-port=:8083
          ports:
          - containerPort: 8083
            protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8083
          volumeMounts:
          - mountPath: /srv/kubernetes/kube-apiserver
            name: kmsplugin
        hostNetwork: true
        priorityClassName: system-cluster-critical

        # we're using this volume for other files too
        volumes:
        - name: kmsplugin
          hostPath:
            path: /srv/kubernetes/kube-apiserver
            type: DirectoryOrCreate
    name: aws-encryption-provider.yaml
    path: /etc/kubernetes/manifests/aws-encryption-provider.yaml
    roles:
    - ControlPlane
  - content: |
      apiVersion: eventratelimit.admission.k8s.io/v1alpha1
      kind: Configuration
      limits:
      - type: Namespace
        qps: 50
        burst: 100
        cacheSize: 2000
      - type: User
        qps: 10
        burst: 50
    name: event-rate-configuration
    path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
    roles:
    - ControlPlane
  - content: |
      apiVersion: apiserver.config.k8s.io/v1
      kind: AdmissionConfiguration
      plugins:
      - name: EventRateLimit
        path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
    name: admission-configuration
    path: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
    roles:
    - ControlPlane
  - content: |
      apiVersion: audit.k8s.io/v1 # This is required.
      kind: Policy
      # Don't generate audit events for all requests in RequestReceived stage.
      omitStages:
        - "RequestReceived"
      rules:
        # Log pod changes at RequestResponse level
        - level: RequestResponse
          resources:
          - group: ""
            # Resource "pods" doesn't match requests to any subresource of pods,
            # which is consistent with the RBAC policy.
            resources: ["pods", "deployments"]

        - level: RequestResponse
          resources:
          - group: "rbac.authorization.k8s.io"
            resources: ["clusterroles", "clusterrolebindings"]

        # Log "pods/log", "pods/status" at Metadata level
        - level: Metadata
          resources:
          - group: ""
            resources: ["pods/log", "pods/status"]

        # Don't log requests to a configmap called "controller-leader"
        - level: None
          resources:
          - group: ""
            resources: ["configmaps"]
            resourceNames: ["controller-leader"]
        # Don't log watch requests by the "system:kube-proxy" on endpoints or services
        - level: None
          users: ["system:kube-proxy"]
          verbs: ["watch"]
          resources:
          - group: "" # core API group
            resources: ["endpoints", "services"]

        # Don't log authenticated requests to certain non-resource URL paths.
        - level: None
          userGroups: ["system:authenticated"]
          nonResourceURLs:
          - "/api*" # Wildcard matching.
          - "/version"

        # Log the request body of configmap changes in kube-system.
        - level: Request
          resources:
          - group: "" # core API group
            resources: ["configmaps"]
          # This rule only applies to resources in the "kube-system" namespace.
          # The empty string "" can be used to select non-namespaced resources.
          namespaces: ["kube-system"]

        # Log configmap changes in all other namespaces at the RequestResponse level.
        - level: RequestResponse
          resources:
          - group: "" # core API group
            resources: ["configmaps"]

        # Log secret changes in all namespaces at the Metadata level.
        - level: Metadata
          resources:
          - group: "" # core API group
            resources: ["secrets"]

        # Log all other resources in core and extensions at the Request level.
        - level: Request
          resources:
          - group: "" # core API group
          - group: "extensions" # Version of group should NOT be included.

        # A catch-all rule to log all other requests at the Metadata level.
        - level: Metadata
          # Long-running requests like watches that fall under this rule will not
          # generate an audit event in RequestReceived.
          omitStages:
            - "RequestReceived"
    name: audit-policy-config
    path: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
    roles:
    - ControlPlane
  iam:
    legacy: false
  keyStore: s3://config/testcluster-api.control.example.com/pki
  kubeAPIServer:
    admissionControlConfigFile: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
    allowPrivileged: true
    anonymousAuth: false
    apiAudiences:
    - kubernetes.svc.default
    apiServerCount: 1
    appendAdmissionPlugins:
    - EventRateLimit
    auditLogMaxAge: 30
    auditLogMaxBackups: 10
    auditLogMaxSize: 100
    auditLogPath: /var/log/apiserver/audit.log
    auditPolicyFile: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
    authorizationMode: Node,RBAC
    bindAddress: 0.0.0.0
    cloudProvider: external
    enableAdmissionPlugins:
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    - NodeRestriction
    - ResourceQuota
    - EventRateLimit
    enableProfiling: false
    etcdServers:
    - https://127.0.0.1:4001
    etcdServersOverrides:
    - /events#https://127.0.0.1:4002
    featureGates:
      CSIMigrationAWS: "true"
      InTreePluginAWSUnregister: "true"
    image: external-docker-registry/kube-apiserver:v1.26.3
    kubeletPreferredAddressTypes:
    - InternalIP
    - Hostname
    - ExternalIP
    logLevel: 2
    requestTimeout: 3m0s
    requestheaderAllowedNames:
    - aggregator
    requestheaderExtraHeaderPrefixes:
    - X-Remote-Extra-
    requestheaderGroupHeaders:
    - X-Remote-Group
    requestheaderUsernameHeaders:
    - X-Remote-User
    securePort: 443
    serviceAccountIssuer: https://api.internal.testcluster-api.control.example.com
    serviceAccountJWKSURI: https://api.internal.testcluster-api.control.example.com/openid/v1/jwks
    serviceClusterIPRange: 100.64.0.0/13
    storageBackend: etcd3
  kubeControllerManager:
    allocateNodeCIDRs: true
    attachDetachReconcileSyncPeriod: 1m0s
    cloudProvider: external
    clusterCIDR: 100.96.0.0/11
    clusterName: testcluster-api.control.example.com
    configureCloudRoutes: false
    enableProfiling: false
    featureGates:
      CSIMigrationAWS: "true"
      InTreePluginAWSUnregister: "true"
      RotateKubeletServerCertificate: "true"
    image: external-docker-registry/kube-controller-manager:v1.26.3
    leaderElection:
      leaderElect: true
    logLevel: 2
    terminatedPodGCThreshold: 10
    useServiceAccountCredentials: true
  kubeDNS:
    cacheMaxConcurrent: 150
    cacheMaxSize: 1000
    cpuRequest: 100m
    domain: cluster.local
    memoryLimit: 170Mi
    memoryRequest: 70Mi
    nodeLocalDNS:
      cpuRequest: 25m
      enabled: false
      image: registry.k8s.io/dns/k8s-dns-node-cache:1.22.20
      memoryRequest: 5Mi
    provider: CoreDNS
    serverIP: 100.64.0.10
  kubeProxy:
    clusterCIDR: 100.96.0.0/11
    cpuRequest: 100m
    image: external-docker-registry/kube-proxy:v1.26.3
    logLevel: 2
  kubeScheduler:
    enableProfiling: false
    featureGates:
      CSIMigrationAWS: "true"
      InTreePluginAWSUnregister: "true"
    image: external-docker-registry/kube-scheduler:v1.26.3
    leaderElection:
      leaderElect: true
    logLevel: 2
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    cgroupDriver: systemd
    cgroupRoot: /
    cloudProvider: external
    clusterDNS: 100.64.0.10
    clusterDomain: cluster.local
    enableDebuggingHandlers: true
    eventQPS: 0
    evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<15%,imagefs.inodesFree<5%
    evictionMaxPodGracePeriod: 120
    evictionSoft: memory.available<250Mi,nodefs.available<15%,nodefs.inodesFree<10%,imagefs.available<20%,imagefs.inodesFree<10%
    evictionSoftGracePeriod: memory.available=30s,imagefs.available=30s,nodefs.available=30s,imagefs.inodesFree=30s,nodefs.inodesFree=30s
    featureGates:
      CSIMigrationAWS: "true"
      InTreePluginAWSUnregister: "true"
      RotateKubeletServerCertificate: "true"
    kernelMemcgNotification: true
    kubeconfigPath: /var/lib/kubelet/kubeconfig
    logLevel: 2
    podInfraContainerImage: external-docker_registry
    podManifestPath: /etc/kubernetes/manifests
    protectKernelDefaults: true
    readOnlyPort: 0
    registerSchedulable: true
    shutdownGracePeriod: 30s
    shutdownGracePeriodCriticalPods: 10s
  kubernetesApiAccess:
  - 192.168.0.0/24
  kubernetesVersion: 1.26.3
  masterKubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    cgroupDriver: systemd
    cgroupRoot: /
    cloudProvider: external
    clusterDNS: 100.64.0.10
    clusterDomain: cluster.local
    enableDebuggingHandlers: true
    eventQPS: 0
    evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<15%,imagefs.inodesFree<5%
    evictionMaxPodGracePeriod: 120
    evictionSoft: memory.available<250Mi,nodefs.available<15%,nodefs.inodesFree<10%,imagefs.available<20%,imagefs.inodesFree<10%
    evictionSoftGracePeriod: memory.available=30s,imagefs.available=30s,nodefs.available=30s,imagefs.inodesFree=30s,nodefs.inodesFree=30s
    featureGates:
      CSIMigrationAWS: "true"
      InTreePluginAWSUnregister: "true"
      RotateKubeletServerCertificate: "true"
    kernelMemcgNotification: true
    kubeconfigPath: /var/lib/kubelet/kubeconfig
    logLevel: 2
    podInfraContainerImage: external-docker-registry/pause:3.6
    podManifestPath: /etc/kubernetes/manifests
    protectKernelDefaults: true
    readOnlyPort: 0
    registerSchedulable: true
    shutdownGracePeriod: 30s
    shutdownGracePeriodCriticalPods: 10s
  masterPublicName: testcluster-api.control.example.com
  networkCIDR: 192.168.2.0/23
  networkID: vpc-0bc336cc29fb4b2a7
  networking:
    calico:
      encapsulationMode: ipip
  nonMasqueradeCIDR: 100.64.0.0/10
  podCIDR: 100.96.0.0/11
  rollingUpdate:
    maxSurge: 50%
  secretStore: s3://config/testcluster-api.control.example.com/secrets
  serviceClusterIPRange: 100.64.0.0/13
  sshAccess:
  - 192.168.0.0/24
  subnets:
  - cidr: 192.168.2.0/26
    id: subnet-040987b99943b6cec
    name: aws_region
    type: Private
    zone: aws_region
  - cidr: 192.168.2.192/26
    id: subnet-091cac4d38a2ee136
    name: utility-aws_region
    type: Utility
    zone: aws_region
  topology:
    dns:
      type: Private
    masters: private
    nodes: private

SCLogo avatar Jun 21 '23 11:06 SCLogo

I tested with a cluster using this command (which should generate a similar cluster):

$ kops-1.27.0-beta.3 --name my.k8s create cluster --cloud=aws --zones eu-central-1a \
  --control-plane-size t3.medium --node-size t3.medium --networking calico \
  --dns=private --topology=private --api-loadbalancer-type internal

The dns.alpha.kubernetes.io/internal: api.internal.my.k8s in present in the kube-apiserver pod. Running

$ kops-1.27.0-beta.3 --name my.k8s export kubeconfig --admin --internal
$ cat ~/.kube/config
...
- cluster:
    certificate-authority-data: ...
    server: https://api.internal.my.k8s
    tls-server-name: api.internal.my.k8s
  name: my.k8s

I can also see the following DNS records:

  • api.my.k8s - A - alias to LB
  • api.my.k8s - AAAA - alias to LB
  • api.internal.my.k8s - A - Simple - IP of control-plane node
  • kops-controller.internal.my.k8s - A - Simple - IP of control-plane node

Looks to me like api.internal.my.k8s is the intended value, the IP of the API server.

hakman avatar Jul 02 '23 06:07 hakman

would you plan to revert changes in newer versions or add an option to use old format ?

aramhakobyan avatar Jan 05 '24 11:01 aramhakobyan

would you plan to revert changes in newer versions or add an option to use old format ?

What do you mean?

hakman avatar Jan 05 '24 12:01 hakman

@hakman - we are getting an error with SSL validation for api.internal.cluster_fqdn, because there is no Certificate Subject Alternative name generated for api.internal.${cluster_fqdn} in kubernetes-ca->kubernetes-master.

here is the error note: real cluster name replaced with ${cluster_fqdn}

15:39:54  E0105 14:39:54.417672    1672 memcache.go:265] couldn't get current server API group list:
Get "https://api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com:8443/api?timeout=32s": 
tls: failed to verify certificate: x509: certificate is valid for kubernetes, kubernetes.default, 
kubernetes.default.svc, kubernetes.default.svc.cluster.local, ${cluster_fqdn}, internal-${cluster_fqdn}, 
api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com, ->>>> not api.internal.${cluster_fqdn}

We are using AWS, Network LB

 api:
   loadBalancer:
     type: Internal
     class: Network
     additionalSecurityGroups: ["{{.k8s_api_sg_id.value}}"]
     sslCertificate: {{.cluster_certificate_arn.value}}
......
......
masterInternalName: internal-{{.cluster_fqdn.value}}

So in previous versions of kops, we were not getting SSL validation errors, since kubernetes-ca contained internal-{{.cluster_fqdn.value}}

I can see all SSL Alternative names by by opening URL https://api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com:8443/api?timeout=32s.

so, In the new version, there are no SSL Alternative names for "api.internal.${cluster_fqdn}"

Perhaps you have another field to add api.internal.${cluster_fqdn}. in the kubernetes-ca->kubernetes-master Alternative name?

aramhakobyan avatar Jan 05 '24 14:01 aramhakobyan

@aramhakobyan Did you update and roll all the nodes and still seeing internal-${cluster_fqdn} and not api.internal.${cluster_fqdn}?

hakman avatar Jan 05 '24 15:01 hakman

@hakman - well "seeing" we see, the problem is not that api.internal.${cluster_fqdn} DNS record is not generated, but the problem is that api.interna.${cluster_fqdn} is not in kubernetes-ca->kubernetes-master Alternative name

Order of upgrade from 1.25

kops export kubecfg --admin
kops toolbox template --values cluster-config.json --values instance-groups.yaml --values core-instance-groups.yaml --values ../terraform/values.json --template cluster-template.yaml --snippets snippets --format-yaml
kops replace --force --filename cluster.yaml
kops update cluster --yes --lifecycle-overrides 
kops export kubecfg --admin

kubectl get deployment  ....  -> getting cert error

When I check certs by opening
https://api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com:8443/api?timeout=32s.

Screenshot 2024-01-05 at 15 44 49

aramhakobyan avatar Jan 05 '24 15:01 aramhakobyan

@aramhakobyan What kops binary version are you using? You should not be able to use masterInternalName anymore and should not be in the Alternative Names.

hakman avatar Jan 05 '24 16:01 hakman

Order of upgrade from 1.25...

@aramhakobyan I think you are missing a kops rolling-update cluster --yes : https://kops.sigs.k8s.io/cli/kops_rolling-update_cluster/

hakman avatar Jan 05 '24 16:01 hakman

Order of upgrade from 1.25...

@aramhakobyan I think you are missing a kops rolling-update cluster --yes : https://kops.sigs.k8s.io/cli/kops_rolling-update_cluster/

It was not mentioned but we run kops rolling-update cluster --yes

What we see is during this rolling update once old master instance is removed it's IP is also removed from Route53 record internal-${cluster_fqdn} by kops. Once the last old master is gone the record is deleted. After that all remaining worker nodes become not ready as they still use this host internal-${cluster_fqdn} to talk to masters.

I tried to create manually CNAME internal-${cluster_fqdn} pointing to api.internal.${cluster_fqdn} after it was deleted by kops, and it would help, but the problem is that even after adding

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
spec:
  api:
    additionalSANs:
    - internal-{{.cluster_fqdn}}

this SAN is not added to kops generated cert and all old worker nodes complain like this

certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, ${cluster_fqdn}, api.internal.${cluster_fqdn}, api-f9672c8f3-ipsk8st-int-084l9p-51f345d8f65d087c.elb.eu-central-1.amazonaws.com, not internal-${cluster_fqdn}

Are we doing something wrong? Is there a way out without downtime and --cloudonly flag?

gerasym avatar Feb 20 '24 14:02 gerasym

Also cluster-completed.spec file from S3 cluster state bucket does not have

spec:
  api:
    additionalSANs:
    - internal-{{.cluster_fqdn}}

So it looks like it is dropped during some conversion?

gerasym avatar Feb 20 '24 14:02 gerasym

Changing to

spec:
  additionalSANs:
  - internal-{{.cluster_fqdn}}

made kops add the needed SAN.

Is there a way to make kops keep old record with new masters IP addresses i.e. to have both:

internal-{{.cluster_fqdn}}
api.internal.{{.cluster_fqdn}}

?

Adding CNAME in the middle of rolling update is not something we want to do :) I did it manually just to validate the theory.

gerasym avatar Feb 20 '24 16:02 gerasym

@hakman - could you please check the last comments from @gerasym ?

aramhakobyan avatar Feb 26 '24 09:02 aramhakobyan

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 26 '24 09:05 k8s-triage-robot