kops icon indicating copy to clipboard operation
kops copied to clipboard

IG: "kops.k8s.io/instancegroup" property missing under "nodeLabels" for instance groups created via "kops create cluster" command

Open salavessa opened this issue 1 year ago • 5 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information. Tested with Client version: 1.28.4 (git-v1.28.4) and Client version: 1.27.3 (git-v1.27.3)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. N/A

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

$ kops create cluster --cloud=aws --dns=private --zones=us-west-2a --name kops.example.com --dry-run -o yaml
# [...]
---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kops.example.com
  name: control-plane-us-west-2a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240126
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - us-west-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kops.example.com
  name: nodes-us-west-2a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240126
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - us-west-2a
$ kops --name kops.example.com create instancegroup zzz --dry-run -oyaml
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kops.example.com
  name: zzz
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240126
  kubelet:
    anonymousAuth: false
    nodeLabels:
      node-role.kubernetes.io/node: ""
  machineType: t3.medium
  manager: CloudGroup
  maxSize: 2
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: zzz
  role: Node
  subnets:
  - us-west-2a

5. What happened after the commands executed? Check Answer 4.

6. What did you expect to happen? I would expect that all properties, specially kops.k8s.io/instancegroup under nodeLabels, to also be created when using kops create cluster command, the same way kops create instancegroup does. The whole kubelet property is missing when creating cluster, so maybe the ideal would be to have all "default" properties aligned between create cluster and create instancegroup commands.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information. Check Answer 4.

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know? I have many existing clusters (k8s v1.27 which have been upgraded many times) with the kops.k8s.io/instancegroup node label set, so this may have been working before, or it may have been set as part of a previous kops upgrade.

salavessa avatar Feb 24 '24 13:02 salavessa

Is this a bug?

Without too much understanding of the codebase design, I notice there's a fallback to "node" role type. If I am not off-track, this is more of an enhancement for a null check anti-pattern due to a hotspot in the codebase, so perhaps nothing to worry about.

teocns avatar Mar 02 '24 20:03 teocns

@teocns Not sure if I understand your comment but the issue is not related with the actual node type, that works just fine.

The issue is that the label (at node level) which identifies the KOPs instance group for each specific node is missing when using the kops create cluster command - we can manually add it afterwards but I wouldn't expect to have to do that, specially because when you create a new instance group the nodeLabels is automatically injected (as described in point 4), and also because this was "working" at some point before (as I have nodes from clusters created with older KOPs versions contain the label).

For our environments that was a breaking change (and we had to manually update+rollout the IGs) because we actively use things like kubernetes anti/affinity rules and prometheus metrics which rely in the value of the kops.k8s.io/instancegroup node label.

The actual yaml/property that I would expect to be present for each IG created via kops create cluster command is 👇

  nodeLabels:
    kops.k8s.io/instancegroup: <IG_NAME>

salavessa avatar Mar 02 '24 20:03 salavessa

Gootcha, you rely on the label as an affinity selector within your own workflow, while my observation was oriented more towards kops' own functional integrity. Thanks for clarifying -

teocns avatar Mar 02 '24 21:03 teocns

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 31 '24 21:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 30 '24 21:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jul 30 '24 21:07 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 30 '24 21:07 k8s-ci-robot