extensions icon indicating copy to clipboard operation
extensions copied to clipboard

DRBD extension not working on arm / rock64

Open Ulrar opened this issue 2 years ago • 5 comments

Hi,

Apologies I don't really know how to debug this, but on my rock64 when using the DRBD extension I seem to be missing the drbd module :

talosctl --talosconfig talosconfig -n mynode list /lib/modules/6.1.58-talos/extras
NODE   NAME
1 error occurred:
 rpc error: code = Unknown desc = lstat /lib/modules/6.1.58-talos/extras: no such file or directory

The same config deployed on x86 machines does have that directory populated with the .ko files as expected. I tried using the tag and also the specific arm hash from here, to be sure but no luck when "upgrading" to the same version to rebuild the initramfs.

I can't access the display for that node, not sure which service log might explain why this is failing ? Thanks

Ulrar avatar Oct 25 '23 19:10 Ulrar

There isn't enough information in the ticket. Is the drbd extension installed? Does it match Talos version?

smira avatar Oct 26 '23 10:10 smira

There isn't enough information in the ticket. Is the drbd extension installed? Does it match Talos version?

Since the directory isn't present on the host I assume it's not, but I don't know how else to check. Linstor definitely isn't finding the DRBD module in any case, so it's not just a path issue.

It is the same (latest) version yes :

image: ghcr.io/siderolabs/drbd:9.2.4-v1.5.4

As stated the exact same config on two other x86 nodes does work fine, the issue is only on the rock64 which is arm64.

Ulrar avatar Oct 26 '23 10:10 Ulrar

you have talosctl get extensions to see what extensions are installed

smira avatar Oct 26 '23 10:10 smira

you can check yourself that the extension does contain the files, so the problem is somewhere probably on your end:

$ crane export ghcr.io/siderolabs/drbd:9.2.4-v1.5.4@sha256:908a2e1129ae6434c5af887b9f3ba7fde039b635e471cef2be808e017d464275 - | tar tv
-rw-r--r-- 0/0             272 2022-01-20 22:35 manifest.yaml
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib/modules
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos
drwxr-xr-x 0/0               0 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras
-rw-r--r-- 0/0         1141122 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras/drbd.ko
-rw-r--r-- 0/0           88162 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras/drbd_transport_rdma.ko
-rw-r--r-- 0/0           49410 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/extras/drbd_transport_tcp.ko
-rw-r--r-- 0/0              74 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.alias
-rw-r--r-- 0/0              48 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.alias.bin
-rw-r--r-- 0/0           58621 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin
-rw-r--r-- 0/0           42432 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin.alias.bin
-rw-r--r-- 0/0           64021 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin.bin
-rw-r--r-- 0/0          362817 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.builtin.modinfo
-rw-r--r-- 0/0             107 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.dep
-rw-r--r-- 0/0             191 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.dep.bin
-rw-r--r-- 0/0               0 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.devname
-rw-r--r-- 0/0            2058 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.order
-rw-r--r-- 0/0              55 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.softdep
-rw-r--r-- 0/0             611 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.symbols
-rw-r--r-- 0/0             752 2022-01-20 22:35 rootfs/lib/modules/6.1.58-talos/modules.symbols.bin

smira avatar Oct 26 '23 10:10 smira

Alright, after a lot of digging I think I figured it out. The issue is the rock64 doesn't really have enough memory to schedule much, and certainly not the piraeus-operator. Even without that the upgrade command just silently kills the node unless I use --stage, I'm guessing because there's not enough memory to run the installer + the whole stack at the same time.

Using --stage I did manage to get drbd installed correctly, but that doesn't leave enough ram to schedule the piraeus-operator (it brings the node up to 107% usage).

Nevermind, I'll get rid of that node, thanks for your help

Ulrar avatar Oct 26 '23 14:10 Ulrar