rook ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-3: (2) No such file or directory pvc based rook

Is this a bug report or feature request?

Bug Report

Croot@storage-b-01:~# kubectl logs -n rook-ceph rook-ceph-osd-3-5d7f669478-cn84c 
Defaulted container "osd" out of: osd, log-collector, blkdevmapper (init), activate (init), expand-bluefs (init), chown-container-data-dir (init)
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640  0 set uid:gid to 167:167 (ceph:ceph)
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640  0 ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable), process ceph-osd, pid 385
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640  0 pidfile_write: ignore empty --pid-file
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640 -1 bluestore(/var/lib/ceph/osd/ceph-3/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-3/block: (13) Permission denied
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-3: (2) No such file or directory

Deviation from expected behavior:

Expected behavior:

How to reproduce it (minimal and precise):

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary

Logs to submit:

Operator's logs, if necessary
Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read GitHub documentation if you need help.

Cluster Status to submit:

Output of kubectl commands, if necessary

To get the health of the cluster, use kubectl rook-ceph health To get the status of the cluster, use kubectl rook-ceph ceph status For more details, see the Rook kubectl Plugin

Environment:

OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod):
Storage backend version (e.g. for ceph do ceph -v):
Kubernetes version (use kubectl version):
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

Feb 11 '24 12:02 13567436138

Its possible that the disk backing the OSD PVC is not available anymore or got renamed after node reboot.
@13567436138 Can you please confirm if that's not the case?

Feb 12 '24 08:02 sp98

I think it's regarding PVC can you guys confirm it?

Feb 12 '24 11:02 hermanlondon

yes i have restart the node,because i am using vmware workstation

Feb 14 '24 06:02 13567436138

delete all osd pod,and now all osd pod are ok.

bash-4.4$ ceph -s
  cluster:
    id:     7e4a1d4e-7041-454f-bbf6-6784c7a6fc90
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive
            Degraded data redundancy: 1 pg undersized
            16 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum a,b,c (age 14m)
    mgr: b(active, since 15m), standbys: a
    osd: 7 osds: 7 up (since 78s), 7 in (since 78s)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   994 MiB used, 34 GiB / 35 GiB avail
    pgs:     100.000% pgs not active
             1 undersized+peered

pgs:     100.000% pgs not active   how to solve this problem???

Feb 15 '24 03:02 13567436138

yes i have restart the node,because i am using vmware workstation

Thanks for this! I expanded a disk on my ESXi VM while running from 128GB to 256GB. Did a rescan and Ubuntu Server picked up the new size. Ceph did as well after marking the OSD as out and down. But when the OSD started back up, Ceph reported I was using 128GB of space. After rebooting the VM, Ceph did whatever magic it needed to and now shows the correct usage and total size.

Apr 12 '24 21:04 dgioulakis

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Jun 12 '24 20:06 github-actions[bot]

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

Jun 20 '24 20:06 github-actions[bot]