ceph-csi
ceph-csi copied to clipboard
CI jobs often fail with minikube error `X Exiting due to GUEST_PROVISION`
Describe the bug
When runninc CI jobs, minikube regularly fails with the following error:
* Failed to start kvm2 VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
X Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
This prevents the job from continuing, and a /retest ...
is needed.
Logs
From mini-e2e_k8s-1.20/1523:
* Creating kvm2 VM (CPUs=8, Memory=14336MB, Disk=20000MB) ...
* Deleting "minikube" in kvm2 ...
! StartHost failed, but will try again: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
* Creating kvm2 VM (CPUs=8, Memory=14336MB, Disk=20000MB) ...
* Failed to start kvm2 VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
X Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
https://github.com/ceph/ceph-csi/pull/2343#issuecomment-891825471 failed again (logs)
https://github.com/ceph/ceph-csi/pull/2308#issuecomment-891856084 failed here too (logs)
https://github.com/ceph/ceph-csi/pull/2354#issuecomment-892400000 as well (logs)
failed in https://github.com/ceph/ceph-csi/pull/2350#issuecomment-892500553 too (logs
https://github.com/ceph/ceph-csi/pull/2341#issuecomment-892516420 hit this too (logs)
@nixpanic as we are hitting frequently is it a good idea to add a wrapper to retry?
https://github.com/ceph/ceph-csi/pull/2322#issuecomment-892634477 hit this (logs)
@nixpanic as we are hitting frequently is it a good idea to add a wrapper to retry?
I don't know. There is already an automated retry by minikube. We should try to identify the cause and work on preventing it. Gathering the hostnames of the CentOS CI bare-metal machines might give a clue. Different groups of hosts have different hardware, and are in different subnets (https://wiki.centos.org/QaWiki/PubHardware).
https://github.com/ceph/ceph-csi/pull/2341#issuecomment-892773815 as well (logs)
https://github.com/ceph/ceph-csi/pull/2339#issuecomment-892851474 hit this too (logs)
https://github.com/ceph/ceph-csi/pull/2351#issuecomment-893288535 has seen this
2x even! https://github.com/ceph/ceph-csi/pull/2351#issuecomment-893288932
backports sometimes hit this too, like https://github.com/ceph/ceph-csi/pull/2369#issuecomment-893454809
hit it again https://github.com/ceph/ceph-csi/pull/2374#issuecomment-895014980
once more https://github.com/ceph/ceph-csi/pull/2376#issuecomment-895195300
https://github.com/ceph/ceph-csi/pull/2448#issuecomment-908397774 failed too
I don't know if it was by chance or the real workaround, but after generating the ssh keys locally via ssh-keygen it worked now for me for the first time after 20-25 different attempts.
Is this something that we can try?
I don't know if it was by chance or the real workaround, but after generating the ssh keys locally via ssh-keygen it worked now for me for the first time after 20-25 different attempts.
Is this something that we can try?
Just want to be sure how generating ssh keys will fix this issue?
I don't know if it was by chance or the real workaround, but after generating the ssh keys locally via ssh-keygen it worked now for me for the first time after 20-25 different attempts. Is this something that we can try?
Just want to be sure how generating ssh keys will fix this issue?
Yac! You are right! It doesn't make any sense.
The other thing I tried was
- Flushing IP tables locally
- Disable SELinux
Maybe it worked because of this!
Today I'm hitting this again locally and none of my previous mentioned guess workarounds are working :-(
adding minikube logs when this fails might be helpful to see what's happening.
! StartHost failed, but will try again: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
* Creating kvm2 VM (CPUs=8, Memory=14336MB, Disk=32768MB) ...
* Failed to start kvm2 VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
X Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
*
��� ���
��� * If the above advice does not help, please let us know: ���
��� https://github.com/kubernetes/minikube/issues/new/choose ���
��� ���
��� * Please run `minikube logs --file=logs.txt` and attach logs.txt to the GitHub issue. ���
��� ���
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
This is definitely not fixed yet :disappointed:
Seems like kubernetes/minikube#11459 is reported too.
This has not happened since a long time.