ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

CI jobs often fail with minikube error `X Exiting due to GUEST_PROVISION`

Open nixpanic opened this issue 3 years ago • 26 comments

Describe the bug

When runninc CI jobs, minikube regularly fails with the following error:

* Failed to start kvm2 VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute

X Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute

This prevents the job from continuing, and a /retest ... is needed.

Logs

From mini-e2e_k8s-1.20/1523:

* Creating kvm2 VM (CPUs=8, Memory=14336MB, Disk=20000MB) ...
* Deleting "minikube" in kvm2 ...
! StartHost failed, but will try again: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
* Creating kvm2 VM (CPUs=8, Memory=14336MB, Disk=20000MB) ...
* Failed to start kvm2 VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute

X Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute

nixpanic avatar Jul 09 '21 13:07 nixpanic

https://github.com/ceph/ceph-csi/pull/2343#issuecomment-891825471 failed again (logs)

nixpanic avatar Aug 03 '21 12:08 nixpanic

https://github.com/ceph/ceph-csi/pull/2308#issuecomment-891856084 failed here too (logs)

nixpanic avatar Aug 03 '21 13:08 nixpanic

https://github.com/ceph/ceph-csi/pull/2354#issuecomment-892400000 as well (logs)

nixpanic avatar Aug 04 '21 06:08 nixpanic

failed in https://github.com/ceph/ceph-csi/pull/2350#issuecomment-892500553 too (logs

nixpanic avatar Aug 04 '21 09:08 nixpanic

https://github.com/ceph/ceph-csi/pull/2341#issuecomment-892516420 hit this too (logs)

nixpanic avatar Aug 04 '21 09:08 nixpanic

@nixpanic as we are hitting frequently is it a good idea to add a wrapper to retry?

Madhu-1 avatar Aug 04 '21 09:08 Madhu-1

https://github.com/ceph/ceph-csi/pull/2322#issuecomment-892634477 hit this (logs)

nixpanic avatar Aug 04 '21 13:08 nixpanic

@nixpanic as we are hitting frequently is it a good idea to add a wrapper to retry?

I don't know. There is already an automated retry by minikube. We should try to identify the cause and work on preventing it. Gathering the hostnames of the CentOS CI bare-metal machines might give a clue. Different groups of hosts have different hardware, and are in different subnets (https://wiki.centos.org/QaWiki/PubHardware).

nixpanic avatar Aug 04 '21 13:08 nixpanic

https://github.com/ceph/ceph-csi/pull/2341#issuecomment-892773815 as well (logs)

nixpanic avatar Aug 04 '21 15:08 nixpanic

https://github.com/ceph/ceph-csi/pull/2339#issuecomment-892851474 hit this too (logs)

nixpanic avatar Aug 04 '21 17:08 nixpanic

https://github.com/ceph/ceph-csi/pull/2351#issuecomment-893288535 has seen this

nixpanic avatar Aug 05 '21 08:08 nixpanic

2x even! https://github.com/ceph/ceph-csi/pull/2351#issuecomment-893288932

nixpanic avatar Aug 05 '21 08:08 nixpanic

backports sometimes hit this too, like https://github.com/ceph/ceph-csi/pull/2369#issuecomment-893454809

nixpanic avatar Aug 05 '21 13:08 nixpanic

hit it again https://github.com/ceph/ceph-csi/pull/2374#issuecomment-895014980

Madhu-1 avatar Aug 09 '21 07:08 Madhu-1

once more https://github.com/ceph/ceph-csi/pull/2376#issuecomment-895195300

nixpanic avatar Aug 09 '21 12:08 nixpanic

https://github.com/ceph/ceph-csi/pull/2448#issuecomment-908397774 failed too

nixpanic avatar Aug 30 '21 14:08 nixpanic

I don't know if it was by chance or the real workaround, but after generating the ssh keys locally via ssh-keygen it worked now for me for the first time after 20-25 different attempts.

Is this something that we can try?

pkalever avatar Sep 16 '21 06:09 pkalever

I don't know if it was by chance or the real workaround, but after generating the ssh keys locally via ssh-keygen it worked now for me for the first time after 20-25 different attempts.

Is this something that we can try?

Just want to be sure how generating ssh keys will fix this issue?

Madhu-1 avatar Sep 16 '21 06:09 Madhu-1

I don't know if it was by chance or the real workaround, but after generating the ssh keys locally via ssh-keygen it worked now for me for the first time after 20-25 different attempts. Is this something that we can try?

Just want to be sure how generating ssh keys will fix this issue?

Yac! You are right! It doesn't make any sense.

The other thing I tried was

  1. Flushing IP tables locally
  2. Disable SELinux

Maybe it worked because of this!

pkalever avatar Sep 16 '21 06:09 pkalever

Today I'm hitting this again locally and none of my previous mentioned guess workarounds are working :-(

pkalever avatar Sep 17 '21 07:09 pkalever

adding minikube logs when this fails might be helpful to see what's happening.

Madhu-1 avatar Sep 22 '21 12:09 Madhu-1


! StartHost failed, but will try again: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute

* Creating kvm2 VM (CPUs=8, Memory=14336MB, Disk=32768MB) ...

* Failed to start kvm2 VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute



X Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute

* 

���                                                                                             ���

���    * If the above advice does not help, please let us know:                                 ���

���      https://github.com/kubernetes/minikube/issues/new/choose                               ���

���                                                                                             ���

���    * Please run `minikube logs --file=logs.txt` and attach logs.txt to the GitHub issue.    ���

���                                                                                             ���

���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

Madhu-1 avatar Sep 22 '21 12:09 Madhu-1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 22 '21 21:10 github-actions[bot]

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

github-actions[bot] avatar Oct 30 '21 21:10 github-actions[bot]

This is definitely not fixed yet :disappointed:

nixpanic avatar Nov 01 '21 08:11 nixpanic

Seems like kubernetes/minikube#11459 is reported too.

nixpanic avatar Jan 17 '22 09:01 nixpanic

This has not happened since a long time.

nixpanic avatar Jun 02 '23 12:06 nixpanic