longhorn icon indicating copy to clipboard operation
longhorn copied to clipboard

[BUG] changing default disk config via annotation fails if /var/lib/longhorn is a mount from a disk label

Open HasseJohansen opened this issue 11 months ago • 4 comments

Describe the bug

I am using longhorn 1.6.0

Trying to automatically adding an extra disk to longhorn by following the documentation here: https://longhorn.io/docs/archives/1.2.3/advanced-resources/default-disk-and-node-config/

Fails for me. I am specifying this in the annotation: node.longhorn.io/default-disks-config: [{"path":"/var/lib/longhorn","allowScheduling":true},{"name":"flash","path":"/mnt/data","allowScheduling":true,"tags":["ssd","fast"]}]

I am then seeing and error about the default disk /var/lib/longhorn:

2024-02-26T12:01:44.364582876Z time="2024-02-26T12:01:44Z" level=warning msg="Failed to get filesystem device type of /var/lib/longhorn/" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:1853" controller=longhorn-setting error="lstat /sys/class/block/COS_PERSISTENT: no such file or directory" node=minis-a562-9b82591f

It seems the problem is that my disk is mounted by label and labels is not available in /sys/class/block/

So it fails on disks mounted by label labels can be found here: /dev/disk/by-label/ and entries points to the real device

So for me /dev/disk/by_label/COS_PERSISTENT -> ../../sda5

To Reproduce

Have the disk mounted on /var/lib/longhorn mounted by-label

Expected behavior

I expect that longhorn can handle disk mounted by label and not only pure block device(as that will exclude a lot of use cases for disks. In this coincidence I am using the Kairos distribution which uses labels for mounting disks)

Support bundle for troubleshooting

Seems the support bundle is too big to attach or email

Environment

  • Longhorn version: 1.6.0
  • Impacted volume (Er på vej!):None created yet
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm via flux
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s via Kairos
    • Number of control plane nodes in the cluster: 2
    • Number of worker nodes in the cluster: 3 (including the 2 control plane nodes)
  • Node config
    • OS type and version: Kairos 2.5.0 Ubuntu 23.10 standard variant
    • Kernel version: 6.5.0
    • CPU per node: 4
    • Memory per node: 16GB
    • Disk type (e.g. SSD/NVMe/HDD): hdd+nvme
    • Network bandwidth between the nodes (Gbps): 1
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: None as I haven't got the basic config ready as I have this issue

Additional context

The /var/lib/longhorn works fine(I think. It is available on each node in the UI at least). I get this problem trying to configure an extra disk automatically via labels/annotations

HasseJohansen avatar Feb 26 '24 13:02 HasseJohansen

2024-02-26T12:01:44.364582876Z time="2024-02-26T12:01:44Z" level=warning msg="Failed to get filesystem device type of /var/lib/longhorn/" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:1853" controller=longhorn-setting error="lstat /sys/class/block/COS_PERSISTENT: no such file or directory" node=minis-a562-9b82591f

This is warning level message and should not lead to disk issue. Can you provide a support bundle?

derekbit avatar Feb 26 '24 13:02 derekbit

Ahh sorry didn't see it was only a warning.

I am trying to find out why my nvme mounted on /data is not getting add'ed to the nodes. I will try some more

The support bundles I can create right now are unfortunately too big to send (over 300MB)

I think you can close this now then. I will reinstall the cluster and see if the /data is added by the annotation and the support bundle will also be small'er so I am able to send it

Having the label: node.longhorn.io/create-default-disk=config

And the annotation:

node.longhorn.io/default-disks-config: [{"path":"/var/lib/longhorn","allowScheduling":true},{"name":"flash","path":"/mnt/data","allowScheduling":true,"tags":["ssd","fast"]}]

Then removing the longhorn helm chart and reinstall it

Will it then honour the annotation or should the annotation be added after the helm install?

I will test some more and see if I can get it working

HasseJohansen avatar Feb 26 '24 13:02 HasseJohansen

Yes, you can follow the steps https://longhorn.io/docs/1.6.0/nodes-and-volumes/nodes/default-disk-and-node-config/.

derekbit avatar Feb 26 '24 13:02 derekbit

That is what I have done :) But haven't been able to get it to work yet

kubectl -n longhorn-system get setting create-default-disk-labeled-nodes                                             10270  15:05:53
NAME                                VALUE   AGE
create-default-disk-labeled-nodes   true    6d19h

kubectl describe node minis-4ac3|grep 'node.longhorn.io/create-default-disk=config'                                  10272  15:07:12
                    node.longhorn.io/create-default-disk=config

kubectl describe node minis-4ac3|grep -A1 'node.longhorn.io/default-disks-config'                                    10276  15:08:40
                    node.longhorn.io/default-disks-config:
                      [{"path":"/var/lib/longhorn","allowScheduling":true},{"name":"flash","path":"/mnt/data","allowScheduling":true,"tags":["ssd","fast"]}]

And it also mounted on the nodes:

kairos@minis-4ac3:~$ mount|grep /data
/dev/nvme0n1 on /data type ext4 (rw,relatime)

Sorry for the formatting

But I read somewhere that it will only be honoured the first time. So maybe it is not honoured because the default disk was already created when I added these things. I will try with a newly provisioned cluster :)

HasseJohansen avatar Feb 26 '24 14:02 HasseJohansen

Ok. It works perfectly on my newly recreated cluster. So it was probably because longhorn was originally installed with create-default-disk_labeled-nodes not set (so false)

Thanks for the help and sorry for the confusion (learning k8s + all these other things - I am getting a little crosseyed)

HasseJohansen avatar Feb 26 '24 21:02 HasseJohansen