k8s-device-plugin
k8s-device-plugin copied to clipboard
[Issue]: Unable to Update ( 1.25.2.7 → 1.25.2.8 )
Problem Description
I have 3 nodes, all the same hardware spec. Running kubernetes on Talos, deployed amd-device-plugin using helm chart and demonset. On tag v1.25.2.3 everything works, each node has access to the iGPU and can be assigned to a pod.
kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amd-device-plugin-b5gsh 1/1 Running 0 15h 10.69.2.122 black-knight-02 <none> <none>
amd-device-plugin-d5rrd 1/1 Running 0 15h 10.69.0.180 black-knight-03 <none> <none>
amd-device-plugin-sf25x 1/1 Running 0 15h 10.69.1.42 black-knight-01 <none> <none>
amd-gpu-node-labeller-g8ntt 1/1 Running 0 8h 10.69.1.30 black-knight-01 <none> <none>
amd-gpu-node-labeller-xqvf8 1/1 Running 0 8h 10.69.0.220 black-knight-03 <none> <none>
amd-gpu-node-labeller-zz7wk 1/1 Running 0 8h 10.69.2.227 black-knight-02 <none> <none>
When i attempt to upgrade to any tag greater than 1.25.2.3. amd-device-plugin fails to deploy on node 3. From what I can tell the image is detecting the wrong system architect?
kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amd-device-plugin-6h7tt 0/1 CrashLoopBackOff 2 (15s ago) 29s 10.69.0.201 black-knight-03 <none> <none>
amd-device-plugin-l956d 1/1 Running 0 29s 10.69.2.219 black-knight-02 <none> <none>
amd-device-plugin-nqv5f 1/1 Running 0 29s 10.69.1.219 black-knight-01 <none> <none>
amd-gpu-node-labeller-h2l6w 1/1 Running 0 29s 10.69.0.25 black-knight-03 <none> <none>
amd-gpu-node-labeller-kzw9q 1/1 Running 0 29s 10.69.2.157 black-knight-02 <none> <none>
amd-gpu-node-labeller-sdhg5 1/1 Running 0 29s 10.69.1.9 black-knight-01 <none> <none>
kubectl describe pod amd-device-plugin-6h7tt -n kube-system
Name: amd-device-plugin-6h7tt
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: default
Node: black-knight-03/10.0.10.27
Start Time: Tue, 08 Oct 2024 09:19:09 +0000
Labels: app.kubernetes.io/component=amd-device-plugin
app.kubernetes.io/instance=amd-device-plugin
app.kubernetes.io/name=amd-device-plugin
controller-revision-hash=599d6ffccd
pod-template-generation=34
Annotations: <none>
Status: Running
IP: 10.69.0.201
IPs:
IP: 10.69.0.201
Controlled By: DaemonSet/amd-device-plugin
Containers:
app:
Container ID: containerd://aa040e5b78f93ad1bb16b2d032348941f0f10de1a71c347b66cc313a74be9e1a
Image: docker.io/rocm/k8s-device-plugin:1.25.2.8
Image ID: docker.io/rocm/k8s-device-plugin@sha256:f3835498cf2274e0a07c32b38c166c05a876f8eb776d756cc06805e599a3ba5f
Port: <none>
Host Port: <none>
Command:
./k8s-device-plugin
Args:
-logtostderr=true
-stderrthreshold=INFO
-v=5
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Tue, 08 Oct 2024 09:19:51 +0000
Finished: Tue, 08 Oct 2024 09:19:51 +0000
Ready: False
Restart Count: 3
Limits:
memory: 100Mi
Requests:
cpu: 10m
memory: 10Mi
Environment:
TZ: Pacific/Auckland
Mounts:
/sys from sys (rw)
/var/lib/kubelet/device-plugins from device-plugins (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xrdfd (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
device-plugins:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
kube-api-access-xrdfd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: feature.node.kubernetes.io/pci-0300_1002.present=true
kubernetes.io/arch=amd64
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 60s default-scheduler Successfully assigned kube-system/amd-device-plugin-6h7tt to black-knight-03
Normal Pulled 19s (x4 over 60s) kubelet Container image "docker.io/rocm/k8s-device-plugin:1.25.2.8" already present on machine
Normal Created 19s (x4 over 60s) kubelet Created container app
Normal Started 19s (x4 over 60s) kubelet Started container app
Warning BackOff 7s (x5 over 58s) kubelet Back-off restarting failed container app in pod amd-device-plugin-6h7tt_kube-system(1d6ae128-c781-41ab-b106-e659b1464cfa)
kubectl -n kube-system logs amd-device-plugin-6h7tt -f
exec ./k8s-device-plugin: exec format error
kubectl describe daemonset amd-device-plugin -n kube-system
Name: amd-device-plugin
Selector: app.kubernetes.io/component=amd-device-plugin,app.kubernetes.io/instance=amd-device-plugin,app.kubernetes.io/name=amd-device-plugin
Node-Selector: feature.node.kubernetes.io/pci-0300_1002.present=true,kubernetes.io/arch=amd64
Labels: app.kubernetes.io/component=amd-device-plugin
app.kubernetes.io/instance=amd-device-plugin
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=amd-device-plugin
helm.sh/chart=app-template-3.5.0
helm.toolkit.fluxcd.io/name=amd-device-plugin
helm.toolkit.fluxcd.io/namespace=kube-system
Annotations: deprecated.daemonset.template.generation: 34
meta.helm.sh/release-name: amd-device-plugin
meta.helm.sh/release-namespace: kube-system
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app.kubernetes.io/component=amd-device-plugin
app.kubernetes.io/instance=amd-device-plugin
app.kubernetes.io/name=amd-device-plugin
Service Account: default
Containers:
app:
Image: docker.io/rocm/k8s-device-plugin:1.25.2.8
Port: <none>
Host Port: <none>
Command:
./k8s-device-plugin
Args:
-logtostderr=true
-stderrthreshold=INFO
-v=5
Limits:
memory: 100Mi
Requests:
cpu: 10m
memory: 10Mi
Environment:
TZ: Pacific/Auckland
Mounts:
/sys from sys (rw)
/var/lib/kubelet/device-plugins from device-plugins (rw)
Volumes:
device-plugins:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
Priority Class Name: system-node-critical
Node-Selectors: feature.node.kubernetes.io/pci-0300_1002.present=true
kubernetes.io/arch=amd64
Tolerations: CriticalAddonsOnly op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulDelete 59m daemonset-controller Deleted pod: amd-device-plugin-qv4kv
Normal SuccessfulDelete 59m daemonset-controller Deleted pod: amd-device-plugin-2m5bz
Normal SuccessfulCreate 59m daemonset-controller Created pod: amd-device-plugin-c8cn4
Normal SuccessfulCreate 59m daemonset-controller Created pod: amd-device-plugin-nkg7f
Normal SuccessfulCreate 54m daemonset-controller Created pod: amd-device-plugin-2kldd
Normal SuccessfulCreate 54m daemonset-controller Created pod: amd-device-plugin-lgfdf
Normal SuccessfulDelete 54m daemonset-controller Deleted pod: amd-device-plugin-nkg7f
Normal SuccessfulDelete 54m daemonset-controller Deleted pod: amd-device-plugin-c8cn4
Normal SuccessfulDelete 53m daemonset-controller Deleted pod: amd-device-plugin-rv4l8
Normal SuccessfulDelete 53m daemonset-controller Deleted pod: amd-device-plugin-2kldd
Normal SuccessfulCreate 53m daemonset-controller Created pod: amd-device-plugin-xbtbm
Normal SuccessfulCreate 53m daemonset-controller Created pod: amd-device-plugin-4mgq8
Normal SuccessfulDelete 48m daemonset-controller Deleted pod: amd-device-plugin-xbtbm
Normal SuccessfulDelete 48m daemonset-controller Deleted pod: amd-device-plugin-4mgq8
Normal SuccessfulCreate 48m daemonset-controller Created pod: amd-device-plugin-w2xnm
Normal SuccessfulCreate 48m daemonset-controller Created pod: amd-device-plugin-tc486
Normal SuccessfulDelete 48m daemonset-controller Deleted pod: amd-device-plugin-lgfdf
Normal SuccessfulCreate 48m daemonset-controller Created pod: amd-device-plugin-n8bkz
Normal SuccessfulCreate 43m (x21 over 28d) daemonset-controller (combined from similar events): Created pod: amd-device-plugin-79rbf
Normal SuccessfulDelete 10m (x38 over 28d) daemonset-controller (combined from similar events): Deleted pod: amd-device-plugin-z6n7b
talosctl dmesg -n black-knight-02 | grep -i amdgpu
black-knight-02: user: warning: [2024-10-08T04:00:29.037121313Z]: [talos] [initramfs] enabling system extension amdgpu-firmware 20240513
black-knight-02: kern: info: [2024-10-08T04:00:34.208534313Z]: [drm] amdgpu kernel modesetting enabled.
black-knight-02: kern: info: [2024-10-08T04:00:34.216765313Z]: amdgpu: Virtual CRAT table created for CPU
black-knight-02: kern: info: [2024-10-08T04:00:34.217421313Z]: amdgpu: Topology: Add CPU node
black-knight-02: kern: info: [2024-10-08T04:00:34.218049313Z]: amdgpu 0000:e5:00.0: enabling device (0006 -> 0007)
black-knight-02: kern: info: [2024-10-08T04:00:34.229941313Z]: amdgpu 0000:e5:00.0: amdgpu: Fetched VBIOS from VFCT
black-knight-02: kern: info: [2024-10-08T04:00:34.230669313Z]: amdgpu: ATOM BIOS: 113-REMBRANDT-X37
black-knight-02: kern: info: [2024-10-08T04:00:34.233905313Z]: amdgpu 0000:e5:00.0: vgaarb: deactivate vga console
black-knight-02: kern: info: [2024-10-08T04:00:34.234649313Z]: amdgpu 0000:e5:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
black-knight-02: kern: info: [2024-10-08T04:00:34.236985313Z]: amdgpu 0000:e5:00.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
black-knight-02: kern: info: [2024-10-08T04:00:34.238133313Z]: amdgpu 0000:e5:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
black-knight-02: kern: info: [2024-10-08T04:00:34.239159313Z]: amdgpu 0000:e5:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
black-knight-02: kern: info: [2024-10-08T04:00:34.241456313Z]: [drm] amdgpu: 512M of VRAM memory ready
black-knight-02: kern: info: [2024-10-08T04:00:34.242076313Z]: [drm] amdgpu: 31762M of GTT memory ready.
black-knight-02: kern: info: [2024-10-08T04:00:34.247748313Z]: amdgpu 0000:e5:00.0: amdgpu: Will use PSP to load VCN firmware
black-knight-02: kern: info: [2024-10-08T04:00:34.425503313Z]: amdgpu 0000:e5:00.0: amdgpu: RAS: optional ras ta ucode is not available
black-knight-02: kern: info: [2024-10-08T04:00:34.437710313Z]: amdgpu 0000:e5:00.0: amdgpu: RAP: optional rap ta ucode is not available
black-knight-02: kern: info: [2024-10-08T04:00:34.438651313Z]: amdgpu 0000:e5:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
black-knight-02: kern: info: [2024-10-08T04:00:34.442337313Z]: amdgpu 0000:e5:00.0: amdgpu: SMU is initialized successfully!
black-knight-02: kern: info: [2024-10-08T04:00:34.459122313Z]: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
black-knight-02: kern: info: [2024-10-08T04:00:34.459818313Z]: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
black-knight-02: kern: info: [2024-10-08T04:00:34.461457313Z]: amdgpu: Virtual CRAT table created for GPU
black-knight-02: kern: info: [2024-10-08T04:00:34.462616313Z]: amdgpu: Topology: Add dGPU node [0x1681:0x1002]
black-knight-02: kern: info: [2024-10-08T04:00:34.463288313Z]: kfd kfd: amdgpu: added device 1002:1681
black-knight-02: kern: info: [2024-10-08T04:00:34.463891313Z]: amdgpu 0000:e5:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
black-knight-02: kern: info: [2024-10-08T04:00:34.465051313Z]: amdgpu 0000:e5:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.465961313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.466875313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.467788313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.468704313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.469631313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.470554313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.471475313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.472396313Z]: amdgpu 0000:e5:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.473332313Z]: amdgpu 0000:e5:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.474289313Z]: amdgpu 0000:e5:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
black-knight-02: kern: info: [2024-10-08T04:00:34.475190313Z]: amdgpu 0000:e5:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
black-knight-02: kern: info: [2024-10-08T04:00:34.476126313Z]: amdgpu 0000:e5:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
black-knight-02: kern: info: [2024-10-08T04:00:34.477095313Z]: amdgpu 0000:e5:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
black-knight-02: kern: info: [2024-10-08T04:00:34.478061313Z]: amdgpu 0000:e5:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
black-knight-02: kern: info: [2024-10-08T04:00:34.480279313Z]: [drm] Initialized amdgpu 3.54.0 20150101 for 0000:e5:00.0 on minor 0
black-knight-02: kern: info: [2024-10-08T04:00:34.488976313Z]: amdgpu 0000:e5:00.0: [drm] Cannot find any crtc or sizes
✦ ⬢ [Docker] ❯ talosctl dmesg -n black-knight-03 | grep -i amdgpu
Operating System
Talos v1.8.0
CPU
AMD 6850U CPU with Radeon Graphics
GPU
AMD Radeon VII
ROCm Version
ROCm 6.2.0
ROCm Component
No response
Steps to Reproduce
Upgrade docker.io/rocm/k8s-device-plugin ( 1.25.2.3 → 1.25.2.8).
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 PRO 6850U with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 PRO 6850U with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4768
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65047108(0x3e08a44) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65047108(0x3e08a44) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65047108(0x3e08a44) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1035
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 2048(0x800) KB
Chip ID: 5761(0x1681)
ASIC Revision: 2(0x2)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2200
BDFID: 58624
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 118
SDMA engine uCode:: 47
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1035
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
black-knight-01 Ready control-plane 53d v1.30.5 10.0.10.25 <none> Talos (v1.8.0) 6.6.52-talos containerd://2.0.0-rc.4
black-knight-02 Ready control-plane 53d v1.30.5 10.0.10.26 <none> Talos (v1.8.0) 6.6.52-talos containerd://2.0.0-rc.4
black-knight-03 Ready control-plane 53d v1.30.5 10.0.10.27 <none> Talos (v1.8.0) 6.6.52-talos containerd://2.0.0-rc.4
kubectl version
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.30.5
Kubecolor Version: 0.4.0
kubectl get no -o json | jq ".items[].metadata.labels"
{
"beta.amd.com/gpu.cu-count.12": "1",
"beta.amd.com/gpu.device-id.1681": "1",
"beta.amd.com/gpu.simd-count.24": "1",
"beta.amd.com/gpu.vram.1G": "1",
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"extensions.talos.dev/amd-ucode": "20240909",
"extensions.talos.dev/amdgpu-firmware": "20240909",
"extensions.talos.dev/modules.dep": "6.6.52-talos",
"extensions.talos.dev/realtek-firmware": "20240909",
"extensions.talos.dev/thunderbolt": "v1.8.0",
"feature.node.kubernetes.io/pci-0300_1002.present": "true",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "black-knight-01",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": ""
}
{
"beta.amd.com/gpu.cu-count.12": "1",
"beta.amd.com/gpu.device-id.1681": "1",
"beta.amd.com/gpu.simd-count.24": "1",
"beta.amd.com/gpu.vram.1G": "1",
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"extensions.talos.dev/amd-ucode": "20240909",
"extensions.talos.dev/amdgpu-firmware": "20240909",
"extensions.talos.dev/modules.dep": "6.6.52-talos",
"extensions.talos.dev/realtek-firmware": "20240909",
"extensions.talos.dev/thunderbolt": "v1.8.0",
"feature.node.kubernetes.io/pci-0300_1002.present": "true",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "black-knight-02",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": ""
}
{
"beta.amd.com/gpu.cu-count.12": "1",
"beta.amd.com/gpu.device-id.1681": "1",
"beta.amd.com/gpu.simd-count.24": "1",
"beta.amd.com/gpu.vram.1G": "1",
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"extensions.talos.dev/amd-ucode": "20240909",
"extensions.talos.dev/amdgpu-firmware": "20240909",
"extensions.talos.dev/modules.dep": "6.6.52-talos",
"extensions.talos.dev/realtek-firmware": "20240909",
"extensions.talos.dev/thunderbolt": "v1.8.0",
"feature.node.kubernetes.io/pci-0300_1002.present": "true",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "black-knight-03",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": ""
}
kubectl get nodes -o=jsonpath='{.items[*].status.nodeInfo.architecture}'
amd64 amd64 amd64