microceph icon indicating copy to clipboard operation
microceph copied to clipboard

microceph daemons stop before ceph consumer daemons stop during host shutdown

Open tregubovav-dev opened this issue 10 months ago • 0 comments

microceph daemons stop before ceph consumer daemons stop

All microceph daemons including osds and monitors stop before ceph consumers like LXD or Incus daemons during host shutdown, In some situation this behavior causes data loss and/or abnormal system behavior. For example graceful cluster shutdown during power outage.

What version of MicroCeph are you using ?

  • microceph reef/stable from snap channel latest/stable
  • LXD 5.21 from snap channel latest/stable
  • Ubuntu 23.10 server (all packages are updated to latest versions)

What are the steps to reproduce this issue ?

  1. Deploy 3 host nodes (VM or physical) with Ubuntu 22.04 or 23.20 server and update all packages; attach one dedicated disk to every node which will be used for ceph storage
  2. install microceph snap using latest/stable package
  3. switch LXD snap to latest/stable channel
  4. configure microceph cluster and join all nodes to the cluster; add disk to the cluster
  5. configure LXD cluster and join all nodes to it. configure ceph storage for LXD cluster
  6. restart all nodes at the same time and watch on shutdown logs output on screen. Yo may see that microceph daemons stop before lxd daemons (see screenshot below) image; However, this does not impact on shutdown process while none of instances is deployed to ceph storage and run
  7. deploy and launch any instances to LXD using ceph storage
  8. restart all nodes at the same time

What happens (observed behaviour) ?

  • all nodes stuck in shutdown and wait for LXD services being stopped (up to 10 minutes in my case) image
  • after some timeout libceph starts reporting lost of communication with osds and monitors: image

LXD can't communicate to ceph as all monitors and osds in cluster are shutdown already, but LXD instances are running but they lost ceph storage already

What were you expecting to happen ?

LXD and other ceph consumers must be stopped before microceph services going to stop during host shutdown.

Relevant logs, error output, etc.

If it’s considerably long, please paste to https://gist.github.com/ and insert the link here.

Additional comments.

tregubovav-dev avatar Apr 17 '24 18:04 tregubovav-dev