microceph ubuntu@canonical:~$ sudo microceph cluster bootstrap Error: Post "http://control.socket/cluster/control": context deadline exceeded

Sep 21 '23 02:09 KyleSanderson

Hi @KyleSanderson , this is a relatively generic error that basically says the backend daemon was unreachable.

Do you have steps to reproduce this issue? Which microceph version did you use?

Thanks.

Sep 21 '23 13:09 sabaini

I know - just saying the timeout is clearly too low. Whatever was on snap at this time.

Sep 21 '23 23:09 KyleSanderson

Not against raising the timeout in principle but the timeout is 30s which does not seem unreasonable for a relatively light weight operation. What kind of delay have you been seeing? I wonder what a good value for a timeout would be

Sep 22 '23 14:09 sabaini

I also get the same error

Oct 11 '23 03:10 supanadit

I'm actually follow this guide https://microk8s.io/docs/how-to-ceph. So the error come up when I run this command: sudo microceph cluster bootstrap exactly after run sudo snap install microceph --channel=latest/edge

My OS Server is: Ubuntu 22.04.03 LTS

Oct 11 '23 03:10 supanadit

I'm actually follow this guide https://microk8s.io/docs/how-to-ceph. So the error come up when I run this command: sudo microceph cluster bootstrap exactly after run sudo snap install microceph --channel=latest/edge

My OS Server is: Ubuntu 22.04.03 LTS

Are you running this on a raid0 NVMe setup?

Oct 11 '23 07:10 KyleSanderson

My server use regular HDD storage with no RAID Configuration. @KyleSanderson

Oct 12 '23 02:10 supanadit

Oh, that's to be expected then.

Oct 12 '23 03:10 KyleSanderson

@KyleSanderson Hi, Solutions QA saw this in several test runs , after microceph cluster join We do have bcache on spinning disks, but still. Do you think there's any workaround or setting we could add to eliminate this?

test run : https://solutions.qa.canonical.com/testruns/d7c57bfe-83e2-4583-a3cc-b66cdfa2a377 logs : https://oil-jenkins.canonical.com/artifacts/d7c57bfe-83e2-4583-a3cc-b66cdfa2a377/index.html

Nov 15 '23 15:11 jeffreychang911

So the timeout increase is merged and should be available on /edge in a few; lets see if this issue raises its ugly head again

Nov 16 '23 08:11 sabaini

Thanks, just ran into this today as well. Cluster join works fine. Get this error after a reboot.

$ microceph status
Error: Failed listing disks: Get "http://control.socket/1.0/disks": dial unix /var/snap/microceph/common/state/control.socket: connect: no such file or directory

Fixed itself after a few reboots (5 times) 🤣

Nov 26 '23 20:11 dvh312

It always happens after rebooting, and I have to purge and install to fix it.

Apr 15 '24 14:04 nakano57

I believe the orig. timeout issue should be fixed. See ticket #342 for issues on upgrading

Jun 03 '24 16:06 sabaini