RSP5 OSD Down after reboot
Describe the bug When i Create an Ceph OSD it works without problems, as soon as i reboot the node the OSD wont come back up. To Reproduce Steps to reproduce the behavior:
- Install Proxmox and Ceph (using your repos of course)
- Create OSD
- Reboot
- OSD gone
ENV (please complete the following information):
- OS: Debian GNU/Linux 12 (bookworm)
- ARCH: [arm64
- Raspberry PI 5
pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.6.20+rpt-rpi-v8)
pve-manager: 8.1.3+pve1 (running version: 8.1.3+pve1/26764642342c55bb)
proxmox-kernel-helper: 8.1.0
ceph: 18.2.0-pve2
ceph-fuse: 18.2.0-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx7
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0-1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.4
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: not correctly installed
pve-firewall: 5.0.3
pve-firmware: 3.8-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.10+pve1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve4
Additional context
systemctl status ceph-osd@*.service dont give back anything also journalctl -xeu [email protected] no entries
i double checked that i am on the right host and use the right OSD number
output of OSD install:
The ZFS modules cannot be auto-loaded.
Try running 'modprobe zfs' as root to manually load them.
command '/sbin/zpool list -HPLv' failed: exit code 1
create OSD on /dev/sda (bluestore)
wiping block device /dev/sda
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.044 s, 201 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8331d767-af24-40da-bac0-ccbaf0fcda92
Running command: vgcreate --force --yes ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099 /dev/sda
stdout: Physical volume "/dev/sda" successfully created.
stdout: Volume group "ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099" successfully created
Running command: lvcreate --yes -l 476924 -n osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099
stdout: Logical volume "osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /bin/chown -h ceph:ceph /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 /var/lib/ceph/osd/ceph-2/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
stderr: 2024-03-13T09:48:04.607+0100 7fb083f180 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2024-03-13T09:48:04.607+0100 7fb083f180 -1 AuthRegistry(0x7fac063e30) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: got monmap epoch 5
--> Creating keyring file for osd.2
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 8331d767-af24-40da-bac0-ccbaf0fcda92 --setuser ceph --setgroup ceph
stderr: 2024-03-13T09:48:05.071+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
stderr: 2024-03-13T09:48:05.075+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
stderr: 2024-03-13T09:48:05.075+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
stderr: 2024-03-13T09:48:05.079+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/sda
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 --path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /bin/ln -snf /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 /var/lib/ceph/osd/ceph-2/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /bin/systemctl enable ceph-volume@lvm-2-8331d767-af24-40da-bac0-ccbaf0fcda92
stderr: Created symlink /etc/systemd/system/multi-user.target.wants/[email protected] -> /lib/systemd/system/[email protected].
Running command: /bin/systemctl enable --runtime ceph-osd@2
stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/[email protected] -> /lib/systemd/system/[email protected].
Running command: /bin/systemctl start ceph-osd@2
--> ceph-volume lvm activate successful for osd ID: 2
--> ceph-volume lvm create successful for: /dev/sda
TASK OK
/var/log/ceph is there has osd logs ?
[2024-03-13 13:19:12,115][ceph_volume.main][INFO ] Running command: ceph-volume lvm trigger 1-18b2426f-90d1-4992-847c-a52b7ef19dc7
[2024-03-13 13:19:12,120][ceph_volume.util.system][WARNING] Executable lvs not found on the host, will return lvs as-is
[2024-03-13 13:19:12,120][ceph_volume.process][INFO ] Running command: lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=1,ceph.osd_fsid=18b2426f-90d1-4992-847c-a52b7ef19dc7} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-03-13 13:19:12,151][ceph_volume.main][INFO ] Running command: ceph-volume lvm trigger 2-611efe97-8305-4a23-9559-33dd95bce599
[2024-03-13 13:19:12,154][ceph_volume.util.system][WARNING] Executable lvs not found on the host, will return lvs as-is
[2024-03-13 13:19:12,154][ceph_volume.process][INFO ] Running command: lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=2,ceph.osd_fsid=611efe97-8305-4a23-9559-33dd95bce599} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-03-13 13:19:12,192][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 46, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main
Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 281, in main
self.activate(args)
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 197, in activate
raise RuntimeError('could not find osd.%s with osd_fsid %s' %
RuntimeError: could not find osd.1 with osd_fsid 18b2426f-90d1-4992-847c-a52b7ef19dc7
[2024-03-13 13:19:12,220][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 46, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main
Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 281, in main
self.activate(args)
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 197, in activate
raise RuntimeError('could not find osd.%s with osd_fsid %s' %
RuntimeError: could not find osd.2 with osd_fsid 611efe97-8305-4a23-9559-33dd95bce599
[2024-03-13 13:19:12,365][ceph_volume.main][INFO ] Running command: ceph-volume lvm trigger 2-8331d767-af24-40da-bac0-ccbaf0fcda92
[2024-03-13 13:19:12,368][ceph_volume.util.system][WARNING] Executable lvs not found on the host, will return lvs as-is
[2024-03-13 13:19:12,368][ceph_volume.process][INFO ] Running command: lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=2,ceph.osd_fsid=8331d767-af24-40da-bac0-ccbaf0fcda92} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-03-13 13:19:12,428][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92,ceph.block_uuid=OUftbF-UGG7-RZfB-tgrn-2KtY-JJe4-5RT0jM,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=594dd1f3-8f66-4a84-bb9b-ab7b6437e739,ceph.cluster_name=ceph,ceph.crush_device_class=,ceph.encrypted=0,ceph.osd_fsid=8331d767-af24-40da-bac0-ccbaf0fcda92,ceph.osd_id=2,ceph.osdspec_affinity=,ceph.type=block,ceph.vdo=0";"/dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92";"osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92";"ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099";"OUftbF-UGG7-RZfB-tgrn-2KtY-JJe4-5RT0jM";"2000364240896
[2024-03-13 13:19:12,428][ceph_volume.devices.lvm.activate][INFO ] auto detecting objectstore
[2024-03-13 13:19:12,432][ceph_volume.devices.lvm.activate][DEBUG ] Found block device (osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92) with encryption: False
[2024-03-13 13:19:12,432][ceph_volume.devices.lvm.activate][DEBUG ] Found block device (osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92) with encryption: False
[2024-03-13 13:19:12,432][ceph_volume.process][INFO ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
[2024-03-13 13:19:12,433][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 --path /var/lib/ceph/osd/ceph-2 --no-mon-config
[2024-03-13 13:19:12,464][ceph_volume.process][INFO ] stderr failed to read label for /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92: (2) No such file or directory
2024-03-13T13:19:12.460+0100 7fb6a1a040 -1 bluestore(/dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92) _read_bdev_label failed to open /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92: (2) No such file or directory
[2024-03-13 13:19:12,467][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 46, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main
Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 281, in main
self.activate(args)
File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 205, in activate
return activate_bluestore(lvs, args.no_systemd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 112, in activate_bluestore
process.run(prime_command)
File "/usr/lib/python3/dist-packages/ceph_volume/process.py", line 147, in run
raise RuntimeError(msg)
RuntimeError: command returned non-zero exit status: 1\
more context:
lvs --version
LVM version: 2.03.16(2) (2022-05-18)
Library version: 1.02.185 (2022-05-18)
Driver version: 4.48.0
Configuration: ./configure --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --libdir=/lib/aarch64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/aarch64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/ --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 1.8T 0 disk
nvme0n1 259:0 0 238.5G 0 disk
vgchange -ay
1 logical volume(s) in volume group "ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099" now active
lsblk after vgchange -ay
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 1.8T 0 disk
└─ceph--b9cc563f--5758--4ead--bbec--74c6aafb7099-osd--block--8331d767--af24--40da--bac0--ccbaf0fcda92 254:0 0 1.8T 0 lvm
/var/lib/ceph/osd/ceph-2 is empty
Found a workaround:
After restart executing
vgchange -ay activates the Logical Volumes and then all the automations take over
IF the restart was longer ago and the automations run into problems running ceph-volume lvm activate --all afterwards brings the OSD back up again
adding
@reboot /usr/sbin/vgchange -ay >> /var/log/vgchange.log 2>&1
to my crontab fixes the issue for me
this is a workaround that fixes my specific error, but i think there is something off that also impacts hot plugging etc.
i hope all my information helps to get this fixed 😊
@jiangcuo I have exact same issue. While two osd are working fine. They are ssd. The last osd which is hdd works fine until I restart the system. In that case, the lvm is lost and it is in "Down" state. The above solution didn't work for me.
Any ideas? @wuast94
ive not using this anymore on my rsp`s, sry