piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

satellite DaemonSet fails to create any of the required Pods due to missing Service Account

Open mmssantos opened this issue 8 months ago • 9 comments

Fresh install of Piraeus Operator 2.8.0 on an Ubuntu 24.04 microk8s 1.31.5

Apart from the satellite DaemonSet, all Pods are coming up like expected:

piraeus-datastore linstor-controller-78b969876c-qtf58 1/1 Running 0 3m10s 10.1.19.29 k8s-13 piraeus-datastore linstor-csi-controller-64b8c7856b-xz26q 7/7 Running 0 3m10s 10.1.61.230 k8s-01 piraeus-datastore piraeus-operator-controller-manager-d455cc747-mslpd 1/1 Running 0 3h58m 10.1.19.28 k8s-13 piraeus-datastore piraeus-operator-gencert-577cc9bb68-t82c4 1/1 Running 0 3h58m 10.1.19.26 k8s-13

Information on the events show the DaemonSet failure due to a missing Service Account:

2m25s (x16 over 5m9s) Warning FailedCreate DaemonSet/linstor-satellite.k8s-03 Error creating: pods "linstor-satellite.k8s-03-" is forbidden: error looking up service account piraeus-datastore/satellite: serviceaccount "satellite" not found 2m24s (x16 over 5m8s) Warning FailedCreate DaemonSet/linstor-satellite.k8s-11 Error creating: pods "linstor-satellite.k8s-11-" is forbidden: error looking up service account piraeus-datastore/satellite: serviceaccount "satellite" not found 2m24s (x16 over 5m8s) Warning FailedCreate DaemonSet/linstor-satellite.k8s-12 Error creating: pods "linstor-satellite.k8s-12-" is forbidden: error looking up service account piraeus-datastore/satellite: serviceaccount "satellite" not found 2m22s (x16 over 5m7s) Warning FailedCreate DaemonSet/linstor-satellite.k8s-13 Error creating: pods "linstor-satellite.k8s-13-" is forbidden: error looking up service account piraeus-datastore/satellite: serviceaccount "satellite" not found 2m21s (x16 over 5m6s) Warning FailedCreate DaemonSet/linstor-satellite.k8s-01 Error creating: pods "linstor-satellite.k8s-01-" is forbidden: error looking up service account piraeus-datastore/satellite: serviceaccount "satellite" not found 2m21s (x16 over 5m5s) Warning FailedCreate DaemonSet/linstor-satellite.k8s-02 Error creating: pods "linstor-satellite.k8s-02-" is forbidden: error looking up service account piraeus-datastore/satellite: serviceaccount "satellite" not found

Looking at the created Service Accounts in the associated namespace shows that it indeed was not created:

kubectl get serviceaccount -n piraeus-datastore

NAME SECRETS AGE default 0 4h1m linstor-controller 0 5m56s linstor-csi-controller 0 5m55s piraeus-operator-controller-manager 0 4h1m piraeus-operator-gencert 0 4h1m

Deleting the Operator and deploying it again does not resolve the issue, with the Service Account not being created.

mmssantos avatar Mar 05 '25 18:03 mmssantos

It looks like the Operator could not complete the full reconciliation of the LinstorCluster resource. Can you check the .status of the LinstorCluster resource?

WanzenBug avatar Mar 06 '25 07:03 WanzenBug

Hello @WanzenBug and thank you so much for your prompt reply. Apologies for the delay in responding, had some weird issues that made me put this into the back burner for a while.

I traced the failure to a copy/paste issue with the required patches for the LinstorCluster definition when running on MicroK8s. After correcting the typo and resolving a weird issue with one of the nodes not having the correct "kubelet" link inside "/var/lib" things are looking much better.

On the Ubuntu 24.04.2 LTS x64 based nodes, all pods have come up successfully.

However, on the Debian Bookworm arm64 base nodes, the linstor-satellite pods are failing to come up due to the drbd-module-loader container failing what appears to be the "make" of the required kernel modules:

A kubectl logs -n piraeus-datastore -c drbd-module-loader linstor-satellite.k8s-11-6ltw2 outputs the following:

Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.2.12/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/6.6.62+rpt-rpi-v8/build

make -C /lib/modules/6.6.62+rpt-rpi-v8/build    "PRE_CFLAGS=" M=/tmp/pkg/drbd-9.2.12/drbd obj-m=dummy-for-compat.o dummy-for-compat-h.o
/usr/src/linux-headers-6.6.62+rpt-common-rpi/Makefile:1032: /usr/src/linux-headers-6.6.62+rpt-common-rpi/scripts/Makefile.extrawarn: No such file or directory
make[2]: *** No rule to make target '/usr/src/linux-headers-6.6.62+rpt-common-rpi/scripts/Makefile.extrawarn'.  Stop.
make[1]: Leaving directory '/tmp/pkg/drbd-9.2.12/drbd'
make[1]: *** [Makefile:236: compat.h] Error 2
make: *** [Makefile:131: module] Error 2

Could not find the expexted *.ko, see stderr for more details

However, checking the arm64 based nodes shows that the files that supposedly can't be accessed, /usr/src/linux-headers-6.6.62+rpt-common-rpi/Makefile:1032: /usr/src/linux-headers-6.6.62+rpt-common-rpi/scripts/Makefile.extrawarn: No such file or directory are present at the filesystem

Any idea on what might be going on here that can point me in the right direction?

mmssantos avatar Apr 02 '25 14:04 mmssantos

The issue is probably that we only try to mount /usr/src from the host. I think on debian-systems there are some scripts that are moved into a separate directory and only symlinked. What's the output of:

readlink -f /usr/src/linux-headers-6.6.62+rpt-common-rpi/scripts/Makefile.extrawarn

WanzenBug avatar Apr 02 '25 14:04 WanzenBug

That seems to be the case indeed. This is the output:

@k8s-11:~$ readlink -f /usr/src/linux-headers-6.6.62+rpt-common-rpi/scripts/Makefile.extrawarn
/usr/lib/linux-kbuild-6.6.62+rpt/scripts/Makefile.extrawarn

And these are the contents of /usr/src

@k8s-11:~$ ls -las /usr/src/
total 24
4 drwxr-xr-x  6 root root 4096 Feb 28 15:51 .
4 drwxr-xr-x 11 root root 4096 Mar 15  2024 ..
4 drwxr-xr-x  4 root root 4096 Nov 28 23:05 linux-headers-6.6.62+rpt-common-rpi
4 drwxr-xr-x  4 root root 4096 Nov 28 23:05 linux-headers-6.6.62+rpt-rpi-v8
4 drwxr-xr-x  4 root root 4096 Feb 28 15:49 linux-headers-6.6.74+rpt-common-rpi
4 drwxr-xr-x  4 root root 4096 Feb 28 15:49 linux-headers-6.6.74+rpt-rpi-v8
0 lrwxrwxrwx  1 root root   30 Nov 25 15:28 linux-kbuild-6.6.62+rpt -> ../lib/linux-kbuild-6.6.62+rpt
0 lrwxrwxrwx  1 root root   30 Jan 27 17:19 linux-kbuild-6.6.74+rpt -> ../lib/linux-kbuild-6.6.74+rpt

mmssantos avatar Apr 02 '25 14:04 mmssantos

🤔 I'm wondering if it would be simpler to install the drbd-dkms package directly on the host.

WanzenBug avatar Apr 02 '25 15:04 WanzenBug

Directly from the http://packages.linbit.com/public/ repo?

mmssantos avatar Apr 02 '25 15:04 mmssantos

Yeah. You might need to pretend to be "proxmox-8" because the public bookworm repos do not have the dkms package installed.

WanzenBug avatar Apr 03 '25 07:04 WanzenBug

Thanks for the help, that worked indeed as expected. Wondering if there are any plans to fix the compilation issues due to how Bookworm has the /usr/src organized or if this will be the way forward to get the solution properly working on Bookworm for now?

mmssantos avatar Apr 04 '25 17:04 mmssantos

The issue is notably not with the normal upstream bookworm (which works just fine), but with the Raspbian variant, which has a custom kernel. For the normal bookworm image, we already install all the linux-kbuild-* packages, so we have the necessary files in /usr/src. But that does not work in this case.

Ideas how to address this welcome 😄

WanzenBug avatar Apr 07 '25 06:04 WanzenBug