microceph icon indicating copy to clipboard operation
microceph copied to clipboard

ubuntu@canonical:~$ sudo microceph cluster bootstrap Error: Post "http://control.socket/cluster/control": context deadline exceeded

Open KyleSanderson opened this issue 2 years ago • 12 comments

KyleSanderson avatar Sep 21 '23 02:09 KyleSanderson

Hi @KyleSanderson , this is a relatively generic error that basically says the backend daemon was unreachable.

Do you have steps to reproduce this issue? Which microceph version did you use?

Thanks.

sabaini avatar Sep 21 '23 13:09 sabaini

I know - just saying the timeout is clearly too low. Whatever was on snap at this time.

KyleSanderson avatar Sep 21 '23 23:09 KyleSanderson

Not against raising the timeout in principle but the timeout is 30s which does not seem unreasonable for a relatively light weight operation. What kind of delay have you been seeing? I wonder what a good value for a timeout would be

sabaini avatar Sep 22 '23 14:09 sabaini

image I also get the same error

supanadit avatar Oct 11 '23 03:10 supanadit

I'm actually follow this guide https://microk8s.io/docs/how-to-ceph. So the error come up when I run this command: sudo microceph cluster bootstrap exactly after run sudo snap install microceph --channel=latest/edge

My OS Server is: Ubuntu 22.04.03 LTS

supanadit avatar Oct 11 '23 03:10 supanadit

I'm actually follow this guide https://microk8s.io/docs/how-to-ceph. So the error come up when I run this command: sudo microceph cluster bootstrap exactly after run sudo snap install microceph --channel=latest/edge

My OS Server is: Ubuntu 22.04.03 LTS

Are you running this on a raid0 NVMe setup?

KyleSanderson avatar Oct 11 '23 07:10 KyleSanderson

My server use regular HDD storage with no RAID Configuration. @KyleSanderson

supanadit avatar Oct 12 '23 02:10 supanadit

Oh, that's to be expected then.

KyleSanderson avatar Oct 12 '23 03:10 KyleSanderson

@KyleSanderson Hi, Solutions QA saw this in several test runs , after microceph cluster join We do have bcache on spinning disks, but still. Do you think there's any workaround or setting we could add to eliminate this?

test run : https://solutions.qa.canonical.com/testruns/d7c57bfe-83e2-4583-a3cc-b66cdfa2a377 logs : https://oil-jenkins.canonical.com/artifacts/d7c57bfe-83e2-4583-a3cc-b66cdfa2a377/index.html

jeffreychang911 avatar Nov 15 '23 15:11 jeffreychang911

So the timeout increase is merged and should be available on /edge in a few; lets see if this issue raises its ugly head again

sabaini avatar Nov 16 '23 08:11 sabaini

Thanks, just ran into this today as well. Cluster join works fine. Get this error after a reboot.

$ microceph status
Error: Failed listing disks: Get "http://control.socket/1.0/disks": dial unix /var/snap/microceph/common/state/control.socket: connect: no such file or directory

Fixed itself after a few reboots (5 times) 🤣

dvh312 avatar Nov 26 '23 20:11 dvh312

It always happens after rebooting, and I have to purge and install to fix it.

nakano57 avatar Apr 15 '24 14:04 nakano57

I believe the orig. timeout issue should be fixed. See ticket #342 for issues on upgrading

sabaini avatar Jun 03 '24 16:06 sabaini