lima icon indicating copy to clipboard operation
lima copied to clipboard

failed to set up forwarding tcp port (`"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)"`)

Open kj-creater opened this issue 2 years ago • 12 comments

Description

I run nginx in the colima docker environment, and I can access it normally at the beginning, but after a period of time, I can no longer access it, and I will encounter this problem almost every time I run for a long time;

I tried using limactl to start the virtual machine and run docker, the result is the same

colima version 0.5.0 limactl version 0.14.1 os: macOS 13.1 M1 vmType: vz

I found the following log

{"level":"debug","msg":"guest agent event: {Time:2022-12-15 09:58:04.406679394 +0000 UTC LocalPortsAdded:[{IP::: Port:8081} {IP:0.0.0.0 Port:8081}] LocalPortsRemoved:[] Errors:[]}","time":"2022-12-15T17:58:05+08:00"}
{"level":"info","msg":"Forwarding TCP from [::]:8081 to 0.0.0.0:8081","time":"2022-12-15T17:58:05+08:00"}
{"level":"info","msg":"Forwarding TCP from 0.0.0.0:8081 to 0.0.0.0:8081","time":"2022-12-15T17:58:05+08:00"}
{"error":"failed to run [ssh -F /dev/null -o IdentityFile=\"/Users/xxx/.lima/_config/user\" -o IdentityFile=\"/Users/xxx/.ssh/github_rsa\" -o IdentityFile=\"/Users/xxx/.ssh/id_rsa\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^[email protected],[email protected]\" -o User=xxx -o ControlMaster=auto -o ControlPath=\"/Users/xxx/.lima/colima/ssh.sock\" -o ControlPersist=5m -T -O forward -L 0.0.0.0:8081:0.0.0.0:8081 -N -f -p 54479 127.0.0.1 --]: \"\": exit status 255","level":"warning","msg":"failed to set up forwarding tcp port 8081 (negligible if already forwarded)","time":"2022-12-15T17:58:05+08:00"}
{"error":"unexpected EOF","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:24:13+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:23+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:23+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:24:23+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:24:23+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:33+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:33+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:24:33+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:24:33+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:43+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:43+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:24:43+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:24:43+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:53+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:24:53+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:24:53+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:24:53+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:03+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:03+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:25:03+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:25:03+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:13+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:13+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:25:13+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:25:13+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:23+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:23+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:25:23+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:25:23+08:00"}
{"level":"info","msg":"Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:33+08:00"}
{"error":"exit status 255","level":"warning","msg":"Failed to set up forward from \"/run/lima-guestagent.sock\" (guest) to \"/Users/xxx/.lima/colima/ga.sock\" (host)","time":"2022-12-15T18:25:33+08:00"}
{"error":null,"level":"warning","msg":"Failed to clean up \"/Users/xxx/.lima/colima/ga.sock\" (host) after forwarding failed","time":"2022-12-15T18:25:33+08:00"}
{"error":"stat /Users/xxx/.lima/colima/ga.sock: no such file or directory","level":"warning","msg":"connection to the guest agent was closed unexpectedly","time":"2022-12-15T18:25:33+08:00"}
colima status
FATA[0003] error retrieving current runtime: empty value
limactl ls
NAME      STATUS     SSH                VMTYPE    ARCH       CPUS    MEMORY    DISK    DIR
colima    Running    127.0.0.1:54479    vz        aarch64    2       2GiB      1GiB    ~/.lima/colima

kj-creater avatar Dec 15 '22 10:12 kj-creater

@kj-creater Some questions to better debug this,

  • Is it consistently reproducible ??
  • Your colima memory settings ??
  • Are you running just nginx in docker ??

I did faced a similar issue today when my vm had lower memory and one of my container crashed after that port forwarding didn't work. On increasing the memory i didn't face this issue. When this happened only that port had issue, ssh and other ports were still working

balajiv113 avatar Dec 15 '22 13:12 balajiv113

It is likely that the guestagent process is crashing. Maybe you can use sudo journalctl -u lima-guestagent to inspect the crash logs. (for the non-colima Docker instance)

AkihiroSuda avatar Dec 15 '22 14:12 AkihiroSuda

When this happens, all operations get stuck, eg: docker ps, colima ssh, limactl shell colima, I can't do anything, it seems like I have lost contact with the guest, I can no longer be sure what is going wrong ; Is there any log about lima-guestagent in the guest? I forcibly stopped the guest and want to find related logs. What should I do?

Is it consistently reproducible ??

It cannot be reproduced stably and irregularly, but it will appear when running for a long time

Your colima memory settings ??

2GiB memory

Are you running just nginx in docker ??

yes

@balajiv113

Maybe you can use sudo journalctl -u lima-guestagent to inspect the crash logs. (for the non-colima Docker instance)

When the error occurs, I can't ssh to the guest, I will try again with the ubuntu image

@AkihiroSuda

lima is great, she helped me out with a lot of issues and I would love to assist with this

kj-creater avatar Dec 16 '22 01:12 kj-creater

I guess it is caused by docker , the same environment using containerd has not yet found the problem

kj-creater avatar Dec 28 '22 09:12 kj-creater

For reference, similar but not exactly the same:

{"level":"debug","msg":"guest agent event: {Time:2023-01-24 23:46:41.635597719 +0000 UTC LocalPortsAdded:[{IP:0.0.0.0 Port:23761} {IP:0.0.0.0 Port:9000} {IP:0.0.0.0 Port:9443}] LocalPortsRemoved:[] Errors:[]}","time":"2023-01-25T00:46:42+01:00"}
{"level":"info","msg":"Forwarding TCP from 0.0.0.0:23761 to 0.0.0.0:23761","time":"2023-01-25T00:46:42+01:00"}
{"error":"failed to run [ssh -F /dev/null -o IdentityFile=\"/Users/miquela/Library/Application Support/rancher-desktop/lima/_config/user\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^[email protected],[email protected]\" -o User=miquela -o ControlMaster=auto -o ControlPath=\"/Users/miquela/Library/Application Support/rancher-desktop/lima/0/ssh.sock\" -o ControlPersist=5m -T -O forward -L 0.0.0.0:23761:0.0.0.0:23761 -N -f -p 64925 127.0.0.1 --]: \"\": exit status 255","level":"warning","msg":"failed to set up forwarding tcp port 23761 (negligible if already forwarded)","time":"2023-01-25T00:46:42+01:00"}
{"level":"info","msg":"Forwarding TCP from 0.0.0.0:9000 to 0.0.0.0:9000","time":"2023-01-25T00:46:42+01:00"}
{"error":"failed to run [ssh -F /dev/null -o IdentityFile=\"/Users/miquela/Library/Application Support/rancher-desktop/lima/_config/user\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^[email protected],[email protected]\" -o User=miquela -o ControlMaster=auto -o ControlPath=\"/Users/miquela/Library/Application Support/rancher-desktop/lima/0/ssh.sock\" -o ControlPersist=5m -T -O forward -L 0.0.0.0:9000:0.0.0.0:9000 -N -f -p 64925 127.0.0.1 --]: \"\": exit status 255","level":"warning","msg":"failed to set up forwarding tcp port 9000 (negligible if already forwarded)","time":"2023-01-25T00:46:42+01:00"}
{"level":"info","msg":"Forwarding TCP from 0.0.0.0:9443 to 0.0.0.0:9443","time":"2023-01-25T00:46:42+01:00"}
{"error":"failed to run [ssh -F /dev/null -o IdentityFile=\"/Users/miquela/Library/Application Support/rancher-desktop/lima/_config/user\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^[email protected],[email protected]\" -o User=miquela -o ControlMaster=auto -o ControlPath=\"/Users/miquela/Library/Application Support/rancher-desktop/lima/0/ssh.sock\" -o ControlPersist=5m -T -O forward -L 0.0.0.0:9443:0.0.0.0:9443 -N -f -p 64925 127.0.0.1 --]: \"\": exit status 255","level":"warning","msg":"failed to set up forwarding tcp port 9443 (negligible if already forwarded)","time":"2023-01-25T00:46:42+01:00"}

In my case sometimes Rancher Desktop/Lima (unsure who's the culprit) leaves hanged ssh processes that prevents further port forwarding and prevents the port forwarding to behave as intended. In this case the solution was to kill the rogue ssh processes from the host computer and following port forwarding actions will then succeed.

IMHO Rancher Desktop / Lima should be more explicit about this misbehavior and prevent the new container to be executed if the port is in use.

madrover avatar Jan 25 '23 12:01 madrover

I'm seeing something similar to the original report, but only using lima. I don't see the same when I run it with docker desktop.

  • limactl version 0.15.1
  • macOS version 13.4 M1

I start it up with

curl -s https://raw.githubusercontent.com/lima-vm/lima/master/examples/docker-rootful.yaml | \
    yq '.mounts |= map(select(.location == "~").writable = true) | .cpus = 8 | with(.memory; .  = "12Gib" | . style="double") | with(.disk; .  = "150Gib" | . style="double") | .vmType = "vz" | .mountType = "virtiofs" | .provision += [{"mode" : "system", "script": "docker run --privileged --rm tonistiigi/binfmt --install all"}]' | \
    limactl start --name=docker -

I start up a handful of containers (postgres, redis, kafka) and when I run tests against them I get an eventual crash. My ssh connection crashes, however, I can reconnect and the containers are all still running. When I monitor the docker stats while it crashes in the VM the resources are well below the allocated values.

Running sudo journalctl -f | tee journalctl_logs there's no logs except a bunch of logs like the below for each container id

lima-docker systemd[1]: run-docker-runtime\x2drunc-moby-73076343d4b7783956a6b3b67d80a6cbe81a7ab940988040b3b13b677bb8cbc6-runc.eNeiU9.mount: Deactivated successfully.

Looking at the guest agent logs, there's nothing after when it started

$ sudo journalctl -u lima-guestagent
lima-docker systemd[1]: Started lima-guestagent.
lima-docker lima-guestagent[2407]: time="2023-05-31T14:47:34Z" level=info msg="event tick: 3s"
lima-docker lima-guestagent[2407]: time="2023-05-31T14:47:34Z" level=info msg="serving the guest agent on \"/run/lima-guestagent.sock\""

Using qemu instead of vz (removing | .vmType = "vz" | .mountType = "virtiofs" from above) has the same effect, however, slightly different errors. The guest agent appears to start back up (I see "Forwarding \"/run/lima-guestagent.sock\" (guest) to \"/Users/{MY_USER}/.lima/docker/ga.sock\" (host)" shortly after the unexpected EOF), however, the docker socket seems inaccessible

❯ docker container ls -a
Cannot connect to the Docker daemon at unix:///Users/{MY_USER}/.lima/docker/sock/docker.sock. Is the docker daemon running?

I can't come up with simple reproducible steps, but hopefully that's enough to point me in the right direction. My gut tells me there's something about how I'm communicating with the containers that's somehow causing problems

seth-epps avatar May 31 '23 16:05 seth-epps

@seth-epps Could you share the machine details lile cpu and ram ?

Also while running those containers were there high load apps running on host machine ??

balajiv113 avatar May 31 '23 16:05 balajiv113

It's an M1 Max with 10 cores and 32Gb ram

Nothing particularly high load. Just chrome, vscode, and slack. Activity monitor shows CPU load consistently low. I give the vm 8 cpus and 12Gib ram which, when watching docker stats and top, seems like it's not even close to fully utilized. I allocate the same to docker desktop when running the same test that doesn't crash.

seth-epps avatar May 31 '23 16:05 seth-epps

@seth-epps facing the same issue here with similar specs for my host machine. Any updates on this?

ejames17 avatar Aug 30 '23 19:08 ejames17

No updates from me. Sadly it was a blocker for some of my work so I had to switch back to docker desktop 😔 If I can come up with a simple reproducible set of steps I'd dig into it further

seth-epps avatar Sep 11 '23 14:09 seth-epps

I start up a handful of containers (postgres, redis, kafka) and when I run tests against them I get an eventual crash.

I experienced the exact same issue with our test suite, except that we had a MongoDB container. It turned out that our tests were leaking MongoDB connections. After ~500 tests/ open connections, Rancher just "silently" crashed with the mentioned error logs.

After fixing the connection leak, everything seems to be pretty stable 🙏

lacco avatar Oct 18 '23 15:10 lacco

Has the same issue when using VSCode devcontainer. The SSH just crashes when VSCode creates multiple connections for its extensions/debug communication.

My system info:

Machine: Macbook Pro M1
RAM: 16GB (allocated 8GB for the VM, using Rancher Desktop)
Rancher Desktop: 1.11.1

trinhpham avatar Dec 28 '23 11:12 trinhpham