talos icon indicating copy to clipboard operation
talos copied to clipboard

Worker node role can't be set

Open vhurtevent opened this issue 2 years ago • 10 comments

Bug Report

When creating a cluster, I want that the worker nodes have explicit role as displayed in a

kubectl describe node command

I tried to set worker role by setting node labels in the machine config spec :

machine:
  nodeLabels:
    node-role.kubernetes.io/worker: "true"

When asking for NodeLabel with talosctl, the label exists :

But the label aren't set on nodes and their role is still <none>.

Logs

In logs, we can see this error :

[ 83.519643] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\tnodes \"dbaas1-worker-0\" is forbidden: is not allowed to modify labels: node-role.kubernetes.io/worker"} Looks like a protected domain label, but how can we set role through Talos node provisionning ?

Environment

  • Talos version: 1.3.2
  • Kubernetes version: 1.24.9
  • Platform: OpenStack

vhurtevent avatar Jan 17 '23 22:01 vhurtevent

This label is not allowed to be set by the kubelet. Similarly it is unsafe for Talos to do the same. Allowing for this allows a worker node to promote itself amd potentially gain access to privileges it shouldn't have.

andrewrynhard avatar Jan 17 '23 22:01 andrewrynhard

Hello @andrewrynhard,

Thank you for your answer, I understand the security problem.

In my use case I would like to distinguish worker nodes which are only workload executors and edge nodes which I dedicate to Ingress controllers executors and are the only backends members of my L4 loadbalancers.

Do you suggest me to drop the use of node-role.kubernetes.io/<any role> and to use a complete custom domain label and value which could be set by Talos through machine.nodeLabels specs ?

Thank you

vhurtevent avatar Jan 18 '23 07:01 vhurtevent

You can set this label outside of Talos, as the last provisioning step, or make the node label itself as something like "my.dev/role", and have something with appropriate permissions to add a matching node-role label. But a worker node by Kubernetes design can't put a role label on itself. So there should be something else running, in the cluster, or outside of the cluster which does that.

smira avatar Jan 18 '23 10:01 smira

Can we add the node-label validation for it?

as I know this labels can be set by kubelet

node-role.kubernetes.io
kubernetes.io/role

sergelogvinov avatar Jan 23 '23 14:01 sergelogvinov

Adding validation to catch this configuration error would be very much appreciated, as I didn't realize this.

Adding special handling would also be very nice, but I think that would have to be some special handling of talosctl parsing a machine's configuration, rather than Talos itself doing that.

nogweii avatar Jun 03 '24 16:06 nogweii

Hi, @nogweii try to use TalosCCM https://github.com/siderolabs/talos-cloud-controller-manager/blob/main/docs/config.md edge version

sergelogvinov avatar Jun 03 '24 16:06 sergelogvinov

Interesting! @sergelogvinov , not to go too off-topic, does talos-ccm work in a bare-metal cluster, running in a homelab? (I'm running a Talos cluster on a Turing Pi 2 with RK1 compute modules.)

nogweii avatar Jun 03 '24 17:06 nogweii

Talos CCM works inside talos cluster ) It does not matter whether Talos is in a cloud or on bare metal.

sergelogvinov avatar Jun 03 '24 17:06 sergelogvinov

I'm unable to set any nodeLabels on the bootstrap of worker nodes

I'm using Talm to set up the worker node, but I don't think it is an issue on Talm's side because I can see the nodeLabels values in the machineConfiguration through talosctl command.

Reproduce the issue

  • Tested on Talos v1.7.4
  • Issue only concerns Worker machine type. It works as expected with Controlplane type.

1. Reset the worker node, then apply the configuration

machine:
  nodeLabels:
    node.cloudprovider.kubernetes.io/platform: proxmox
    topology.kubernetes.io/region: Region-1
    topology.kubernetes.io/zone: pve03
    # truncated
talm apply -f nodes/worker-01.yaml -i

2. Wait for the worker node to join the cluster and describe the node labels

kubectl describe node worker-01
Name:               worker-01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker-01
                    kubernetes.io/os=linux
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 20 Jun 2024 22:52:21 +0200
Taints:             <none>
Unschedulable:      false

3. Ensure nodeLabels is correctly setup in machineConfiguration

talosctl get mc --nodes 192.168.100.21 -e 192.168.100.21 --talosconfig=./talosconfig -oyaml |yq -r '.spec.machine.nodeLabels'
node.cloudprovider.kubernetes.io/platform: proxmox
topology.kubernetes.io/region: Region-1
topology.kubernetes.io/zone: pve03

Workaround: Set the labels via kubectl after the nodes join the cluster

kubectl label node worker-01 node.cloudprovider.kubernetes.io/platform=proxmox
kubectl label node worker-01 topology.kubernetes.io/region=Region-1
kubectl label node worker-01 topology.kubernetes.io/zone=pve03

I can open a new issue if needed.

mydoomfr avatar Jun 20 '24 21:06 mydoomfr

Please see NodeRestriction documentation - this is by default enabled on Kubernetes side, and there's nothing we can do on Talos side to workaround it.

If you use labels which are not restricted, Kubernetes API server would allow them to be set. But in this case Talos Linux has same level of access as the kubelet running on the node.

There might be some better way to do config validation/documentation, but there is no "fix" whatsoever, except for changing the admission controller rules.

smira avatar Jun 28 '24 17:06 smira

Just throwing out that the docs were still somewhat missing on this. For a worker/storage node I had to dig up the kubelet args and set

machine:
  kubelet:
    extraArgs:
      node-labels: "node.kubernetes.io/instance-type=ceph-storage"
      register-with-taints: "node.kubernetes.io/instance-type=ceph-storage:NoSchedule"

instead of using the intuitive

machine:
  nodeLabels:
  nodeTaints:

Maybe those would work on a controlplane node?

especially-relative avatar Jan 05 '25 17:01 especially-relative

Also, extraConfig could be used for taints (looks better imo), but not for labels

machine:
  kubelet:
    extraConfig
      registerWithTaints:
        - key: hello
          effect: NoSchedule

rgeraskin avatar Feb 13 '25 23:02 rgeraskin

Is this information still missing from the docs? I'm assuming it's still relevant even though it's a bit old.

rothgar avatar Apr 22 '25 05:04 rothgar