kops icon indicating copy to clipboard operation
kops copied to clipboard

No instance created if I set manager Karpenter by default

Open boskiv opened this issue 2 years ago • 5 comments

I'm creating a cluster in aws with next config. But my cluster has no running nodes.

W0720 15:10:00.715510   31441 validate_cluster.go:232] (will retry): cluster not yet healthy
INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-ap-northeast-1a   ControlPlane    t3.medium       1       1       ap-northeast-1a
nodes-ap-northeast-1a           Node            t3.medium       2       2       ap-northeast-1a
nodes-ap-northeast-1d           Node            t3.medium       1       1       ap-northeast-1d

NODE STATUS
NAME    ROLE    READY

VALIDATION ERRORS
KIND    NAME            MESSAGE
dns     apiserver       Validation Failed

The dns-controller Kubernetes deployment has not updated the Kubernetes cluster's API DNS entry to the correct IP address.  The API DNS IP address is the placeholder address that kops creates: 203.0.113.123.  Please wait about 5-10 minutes for a control plane node to start, dns-controller to launch, and DNS to propagate.  The protokube container and dns-controller deployment logs may contain more diagnostic information.  Etcd and the API DNS entries must be updated for a kops Kubernetes cluster to start.

Validation Failed
W0720 15:10:10.719972   31441 validate_cluster.go:232] (will retry): cluster not yet healthy
Error: validation failed: wait time exceeded during validation

Is the way to create first node by default and apply Karpenter scaler after from config?

Or Only way is edit IG after created?

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  name: k8s.kops.uat.aws.xxx.cloud
spec:
  api:
    dns: {}
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enabled: true
  channel: stable
  certManager:
    enabled: true
  cloudProvider: aws
  configBase: s3://sf-kops-state-store/k8s.kops.uat.aws.xxx.cloud
  dnsZone: kops.uat.aws.xxx.cloud
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-ap-northeast-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  externalDns:
    watchIngress: true
  iam:
    useServiceAccountExternalPermissions: true
    allowContainerRegistry: true
    legacy: false
  karpenter:
    enabled: true
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.3
  masterPublicName: api.k8s.kops.uat.aws.xxx.cloud
  metricsServer:
    enabled: true
  nodeProblemDetector:
    enabled: true
    memoryRequest: 32Mi
    cpuRequest: 10m
  networkCIDR: 172.20.0.0/16
  networking:
    amazonvpc: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  podIdentityWebhook:
    enabled: true
  snapshotController:
    enabled: true
  cloudConfig:
    awsEBSCSIDriver:
      enabled: true
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - cidr: 172.20.32.0/19
    name: ap-northeast-1a
    type: Public
    zone: ap-northeast-1a
  - cidr: 172.20.64.0/19
    name: ap-northeast-1d
    type: Public
    zone: ap-northeast-1d
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: k8s.kops.uat.aws.xxx.cloud
  name: control-plane-ap-northeast-1a
spec:
  manager: Karpenter
  image: ami-05ffd9ad4ddd0d6e2
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - ap-northeast-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: k8s.kops.uat.aws.xxx.cloud
  name: nodes-ap-northeast-1a
spec:
  manager: Karpenter
  image: ami-05ffd9ad4ddd0d6e2
  machineType: t3.medium
  maxSize: 2
  minSize: 2
  role: Node
  subnets:
  - ap-northeast-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: k8s.kops.uat.aws.xxx.cloud
  name: nodes-ap-northeast-1d
spec:
  manager: Karpenter
  image: ami-05ffd9ad4ddd0d6e2
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - ap-northeast-1d

boskiv avatar Jul 20 '23 12:07 boskiv

When I creating with CLI only control pane Instance group created but no for nodes and cluster validate failed

#!/bin/bash
source .env
kops create cluster \
    --zones ${ZONES} \
    --master-count=1 \
    --node-count=3 \
    --control-plane-image ${AMI} \
    --node-image ${AMI} \
    --node-size ${NODE_SIZE} \
    --master-size ${CONTROL_PLANE_SIZE} \
    --instance-manager=karpenter \
    --discovery-store=s3://sf-k8s-oidc-store \
    --networking=amazonvpc \
    --dns-zone ${DNS} \
    --yes

boskiv avatar Jul 20 '23 13:07 boskiv

I'm seeing something similar with the attached config.

My cluster only comes up if I drop:

spec:
  externalDns:
    watchIngress: true

Cluster and IG config YAML:

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: prod.cluster-foo.com
spec:
  api:
    loadBalancer:
      class: Network
      type: Public
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enabled: true
  certManager:
    defaultIssuer: lets-encrypt
    enabled: true
  channel: stable
  cloudProvider: aws
  configBase: s3://prod-cluster-foo-com-state-store/prod.cluster-foo.com
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-2a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-2a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  externalDns:
    watchIngress: true
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: true
  karpenter:
    enabled: true
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.3
  masterPublicName: api.prod.cluster-foo.com
  networkCIDR: 172.20.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://prod-cluster-foo-com-oidc-store/prod.cluster-foo.com/discovery/prod.cluster-foo.com
    enableAWSOIDCProvider: true
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-2a
    type: Public
    zone: us-east-2a
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: prod.cluster-foo.com
  name: control-plane-us-east-2a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230608
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - us-east-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: prod.cluster-foo.com
  name: nodes
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230608
  manager: Karpenter
  role: Node
  subnets:
  - us-east-2a

techthumb avatar Jul 23 '23 21:07 techthumb

It turns out that if the externalDns key is present, then the provider must be specified as well!

spec:
  externalDns:
    provider: dns-controller
    watchIngress: true

techthumb avatar Jul 23 '23 22:07 techthumb

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 25 '24 03:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 24 '24 04:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 25 '24 04:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 25 '24 04:03 k8s-ci-robot