rook icon indicating copy to clipboard operation
rook copied to clipboard

ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-3: (2) No such file or directory pvc based rook

Open 13567436138 opened this issue 2 years ago • 4 comments

Is this a bug report or feature request?

  • Bug Report
Croot@storage-b-01:~# kubectl logs -n rook-ceph rook-ceph-osd-3-5d7f669478-cn84c 
Defaulted container "osd" out of: osd, log-collector, blkdevmapper (init), activate (init), expand-bluefs (init), chown-container-data-dir (init)
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640  0 set uid:gid to 167:167 (ceph:ceph)
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640  0 ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable), process ceph-osd, pid 385
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640  0 pidfile_write: ignore empty --pid-file
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640 -1 bluestore(/var/lib/ceph/osd/ceph-3/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-3/block: (13) Permission denied
debug 2024-02-11T12:46:00.648+0000 7f12e5ad8640 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-3: (2) No such file or directory

Deviation from expected behavior:

Expected behavior:

How to reproduce it (minimal and precise):

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary

Logs to submit:

  • Operator's logs, if necessary

  • Crashing pod(s) logs, if necessary

    To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read GitHub documentation if you need help.

Cluster Status to submit:

  • Output of kubectl commands, if necessary

    To get the health of the cluster, use kubectl rook-ceph health To get the status of the cluster, use kubectl rook-ceph ceph status For more details, see the Rook kubectl Plugin

Environment:

  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod):
  • Storage backend version (e.g. for ceph do ceph -v):
  • Kubernetes version (use kubectl version):
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

13567436138 avatar Feb 11 '24 12:02 13567436138

Its possible that the disk backing the OSD PVC is not available anymore or got renamed after node reboot.
@13567436138 Can you please confirm if that's not the case?

sp98 avatar Feb 12 '24 08:02 sp98

I think it's regarding PVC can you guys confirm it?

hermanlondon avatar Feb 12 '24 11:02 hermanlondon

yes i have restart the node,because i am using vmware workstation

13567436138 avatar Feb 14 '24 06:02 13567436138

delete all osd pod,and now all osd pod are ok.

bash-4.4$ ceph -s
  cluster:
    id:     7e4a1d4e-7041-454f-bbf6-6784c7a6fc90
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive
            Degraded data redundancy: 1 pg undersized
            16 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum a,b,c (age 14m)
    mgr: b(active, since 15m), standbys: a
    osd: 7 osds: 7 up (since 78s), 7 in (since 78s)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   994 MiB used, 34 GiB / 35 GiB avail
    pgs:     100.000% pgs not active
             1 undersized+peered
pgs:     100.000% pgs not active   how to solve this problem???

13567436138 avatar Feb 15 '24 03:02 13567436138

yes i have restart the node,because i am using vmware workstation

Thanks for this! I expanded a disk on my ESXi VM while running from 128GB to 256GB. Did a rescan and Ubuntu Server picked up the new size. Ceph did as well after marking the OSD as out and down. But when the OSD started back up, Ceph reported I was using 128GB of space. After rebooting the VM, Ceph did whatever magic it needed to and now shows the correct usage and total size.

dgioulakis avatar Apr 12 '24 21:04 dgioulakis

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 12 '24 20:06 github-actions[bot]

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

github-actions[bot] avatar Jun 20 '24 20:06 github-actions[bot]