colima icon indicating copy to clipboard operation
colima copied to clipboard

VM cannot get network address and start k8s after OS restart

Open RostislavDublin opened this issue 1 year ago • 11 comments

Description

Colima VM created on Mac M1 Ventura with enabled --kubernetes and --network-address options lose its network address (192.168.106.2) and cannot start k8s after the MacOS reboot. The VM deletion, then the Mac reboot, and the start of a new VM resolves the issue but only until a next Mac reboot

Version

Colima Version: 0.5.2 Lima Version: 0.15.0 Qemu Version: I don't know

Operating System

  • [ ] macOS Intel <= 12 (Monterrey)
  • [ ] macOS Intel >= 13 (Ventura)
  • [ ] macOS M1 <= 12 (Monterrey)
  • [X] macOS M1 >= 13 (Ventura)
  • [ ] Linux

Output of colima status

~$ colima status INFO[0000] colima is running using QEMU INFO[0000] arch: aarch64 INFO[0000] runtime: docker INFO[0000] mountType: sshfs INFO[0000] address: INFO[0000] socket: unix:///Users/Rostislav_Dublin/.colima/default/docker.sock INFO[0000] kubernetes: enabled

Reproduction Steps

  1. Create a VM with --kubernetes and --network-address enabled, deploy some k8s workloads, and feel happy...

  2. Stop and start your VM if needed, and reconfigure the CPU and memory settings with no problems.

  3. Shutdown and reboot your Mac

  4. Make sure you cannot successfully get your Kubernetes back to life anymore:

    • each time you call "colima start" it now takes too long...

    • and during the VM startup you see (in a second terminal window) a blank value in the "colima list" command output ADDRESS column image

    • and after several minutes of waiting you see the final output message: image

    • and if you run "docker ps -a" you see all containers (including k8s) stopped: image

  5. Delete your Colima VM.

  6. Reboot your Mac (reboot is mandatory!)

  7. Start a new Colima VM

  8. Now you have your k8s again... until a next Mac restart

Expected behaviour

Your VM successfully overcomes OS restarts.

Additional context

No response

RostislavDublin avatar Mar 05 '23 21:03 RostislavDublin

Seems the failed network address allocation is the cause. Does this happen quite often for you?

abiosoft avatar Mar 06 '23 06:03 abiosoft

I have the issue 100% each time I reboot my Mac. I especially experimented multiple times, but the pattern was always the same as described above:

  • I create a VM and use it happily
  • I reboot the Mac
  • I start the VM and have issues with the ADDRESS and k8s
  • I delete the VM and reboot my Mac again
  • I create a new VM and it works again... until a new Mac reboot

RostislavDublin avatar Mar 06 '23 15:03 RostislavDublin

@abiosoft, how can I help you to get more details on this?

RostislavDublin avatar Mar 06 '23 15:03 RostislavDublin

unfortunately, exactly the same started happening to me today on Mac as well

I tried colima delete, reinstalled colima, and attempted to start again only to get stuck on the following:

$ colima start --cpu 5 --memory 10 --disk 40 --kubernetes --network-address
INFO[0000] starting colima
INFO[0000] runtime: docker+k3s
INFO[0000] preparing network ...                         context=vm
WARN[0015] error setting up network dependencies: error at 'preparing network': error running [/opt/homebrew/bin/colima daemon status default], output: "time=\"2023-03-07T17:31:48+01:00\" level=fatal msg=\"pid file not found: stat /Users/jaroslav.kubicek/.colima/default/daemon/daemon.pid: no such file or directory\"", err: "exit status 1"  context=vm
INFO[0015] creating and starting ...                     context=vm
WARN[0015] error setting up reachable IP address: vmnet socket file not found: stat /Users/jaroslav.kubicek/.colima/default/daemon/vmnet.sock: no such file or directory
> [hostagent] Waiting for the essential requirement 1 of 5: "ssh"

EDIT: this error got solved by restarting, but I'm still getting the same error as described here in the issue:

FATA[0093] error starting kubernetes: error running [lima kubectl cluster-info], output: "The connection to the server localhost:8080 was refused - did you specify the right host or port?", err: "exit status 1"

jaroslav-kubicek avatar Mar 07 '23 16:03 jaroslav-kubicek

I temporarily uninstalled Colima and returned to Docker Desktop. So pity. I really liked Colima's approach and would like to continue with it. Pls, ping me when the issue is fixed. Thank you for your gr8 efforts!

RostislavDublin avatar Mar 08 '23 19:03 RostislavDublin

The network address issue mainly surfaced in macOS Ventura, it was more stable in older macOS versions.

Considering there have been reports of better experience with bridged network, the ability to toggle between bridged and shared is being worked on.

The preference is still shared network and we will keep troubleshooting to find the root cause of the erratic behaviour.

abiosoft avatar Mar 09 '23 06:03 abiosoft

@RostislavDublin can you try the latest development version brew install --head colima and see if the issue still persists?

Thanks.

abiosoft avatar Apr 02 '23 15:04 abiosoft

this issue still exists after brew install --head colima

speedupmate avatar Apr 06 '23 07:04 speedupmate

FYI if you have colima https://github.com/abiosoft/colima/commit/20ba980d963a36cb71c5844c80caf6bcee13d7cd or later (v.0.5.5 will suffice, or reinstall using --head as suggested above) then you have a workaround for this: assign a static IP via the COLIMA_IP env var at the very end of your colima.yaml file.

env:
  COLIMA_IP: 192.168.106.10
  # and any other env vars you need, if any

Thanks for adding this option while the original problem can be trouble-shot!

emanuil-tolev avatar Jun 08 '23 20:06 emanuil-tolev

I haven't run all the scenarios, but could this be related to not getting a static IP on reboot? I seem at least to get this when moving my Mac from the office to the home office, but testing it full on requires some time.

A workaround where one can edit / add network on a configured machine would at least help out, won't need to download all images again.

sastorsl avatar Feb 06 '24 07:02 sastorsl

See if this might be the cause: https://github.com/abiosoft/colima/issues/458#issuecomment-1989839779

norrs avatar Mar 13 '24 22:03 norrs