Summary

When running a non-trivial web app in a Lima VM using the k3s template, the network stops being responsive. This seems to happen when multiple persistent HTTP connections are established. With non-trivial web apps which load many resources, this can be triggered by simply opening the app in two tabs at once.

Reproduction

Prepare these files (somewhere under your ~):

Dockerfile

FROM python:3.11-slim

ARG CSS_COUNT=1000
ENV CSS_COUNT=${CSS_COUNT}

WORKDIR /app

RUN cat <<EOF >main.py
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles

class NoCacheStaticFiles(StaticFiles):
    async def get_response(self, path: str, scope):
        response = await super().get_response(path, scope)
        response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
        response.headers["Pragma"] = "no-cache"
        response.headers["Expires"] = "0"
        return response

app = FastAPI()
app.mount("/", NoCacheStaticFiles(directory="static", html=True), name="static")
EOF

RUN mkdir static

RUN echo '<!DOCTYPE html>' > static/index.html && \
    echo '<html><head><title>Test Lima Port Bug</title>' >> static/index.html && \
    for i in $(seq 0 $((${CSS_COUNT} - 1))); do \
        echo "  <link rel=\"stylesheet\" href=\"$i.css\">" >> static/index.html; \
    done && \
    echo '</head><body><h1>Hello from FastAPI static site!</h1></body></html>' >> static/index.html

RUN for i in $(seq 0 $((${CSS_COUNT} - 1))); do \
        echo -n "/* " > static/$i.css && \
        head -c 500000 /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 80 | head -n 100 >> static/$i.css && \
        echo " */" >> static/$i.css; \
    done

RUN pip install fastapi uvicorn

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

k8s.yaml

apiVersion: v1
kind: Service
metadata:
  name: repro-service
spec:
  type: NodePort
  selector:
    app: repro
  ports:
    - port: 8000
      nodePort: 31833
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-repro
spec:
  replicas: 1
  selector:
    matchLabels:
      app: repro
  template:
    metadata:
      labels:
        app: repro
    spec:
      containers:
        - name: repro
          image: repro:local
          ports:
            - containerPort: 8000

Build image and start the server (this needs a local docker CLI to build the image, I use Colima, it probably does not matter):

docker build -t repro:local .
docker save repro:local >repro.tar
limactl --tty=false start template://k3s --name=repro --mount=~
limactl --tty=false shell repro -- sudo ctr images import repro.tar
limactl --tty=false shell repro -- kubectl apply -f k8s.yaml

Open http://localhost:31833/ in browser. It will load just fine the first time. Leave the tab open and open http://localhost:31833/ in another tab. Repeat and open a few tabs like this, until you'll notice that the page stops loading, being stuck in a "pending" state. The stuck pages will sometimes load eventually after 20s or so, some will time out.

What I discovered

It appears that the new tabs are unable to establish TCP connections to the application. This seems to be related to the number of existing HTTP persistent connections (Connection: keep-alive). When I force Connection: close in the server, the problem disappears. And it's the reason why the bug only manifests when multiple tabs are open, as browsers hold separate connections per tab. The huge number of dummy CSSs is there just to force the browser to open the max number of connections.

Notably this bug does not happen when using kubectl port-forward instead of the NodePort, so the problem is somewhere in the Lima networking stack or perhaps the way it interacts with the k3s networking stack.

Versions

macOS 15.5 (24F74) limactl version 1.1.1 (installed from Homebrew) happens on both VZ and QEMU VMs

Jun 03 '25 13:06 JanPokorny

Tried different versions of Lima and it appears that this is a regression between 1.0.7 and 1.1.0.

Jun 03 '25 16:06 JanPokorny

I had some problems initially to reproduce it; I was opening lots of tabs with http://localhost:31833 quickly, and they all worked. But when I waited a bit between opening tabs, I would eventually see the failure.

The corresponding error in the hostagent log is:

{"error":"close tcp 127.0.0.1:6443-\u003e127.0.0.1:55606: shutdown: socket is not connected","level":"debug","msg":"failed to call CloseRead","time":"2025-06-03T10:18:56-07:00"}

It seems to be a problem with the GRPC port forwarder. When I disabled it, I could no longer reproduce the issue:

export LIMA_SSH_PORT_FORWARDER=true

It needs to be set before you start the instance. Could you please try and confirm that this "fixes" the issue for you as well?

Jun 03 '25 19:06 jandubois

Thanks for the bug, I think the problem is that the connection is not reused if that keep alive comes up. We still dial for new connection.

I will check on this and try to fix it.

Jun 04 '25 03:06 balajiv113

@jandubois Yes, LIMA_SSH_PORT_FORWARDER=true works. Thank you for providing the workaround!

Jun 04 '25 09:06 JanPokorny

It seems like I got the same problem --- and heavy use of S3 connection from chrome to minio.

Jun 11 '25 15:06 mabels

@mabels And does switching to the SSH port forwarder fix things for you as well?

Jun 11 '25 16:06 jandubois

It does

On Wed 11. Jun 2025 at 18:27, Jan Dubois @.***> wrote:

jandubois left a comment (lima-vm/lima#3601) https://github.com/lima-vm/lima/issues/3601#issuecomment-2963465939

@mabels https://github.com/mabels And does switching to the SSH port forwarder fix things for you as well?

— Reply to this email directly, view it on GitHub https://github.com/lima-vm/lima/issues/3601#issuecomment-2963465939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEWJSNPPJW3CGC5CZ5OBL3DBKHZAVCNFSM6AAAAAB6PXBSG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNRTGQ3DKOJTHE . You are receiving this because you were mentioned.Message ID: @.***>

Jun 12 '25 13:06 mabels

I've added the "priority/high" label to this issue. I think we need to either find a fix for it for 1.1.2, or revert the default back to SSH.

@balajiv113 Do you expect to have time to look into this, or would you want to revert back the default first, to have more time?

Jun 12 '25 18:06 jandubois

Connection: keep-alive

Is it possible to reproduce this issue without using k3s?

Jun 30 '25 07:06 AkihiroSuda

I have the problem without k3s --- just a service on the VM, in my case, a Docker container running minio.

Jun 30 '25 07:06 mabels

I think I found a minimal repro of this issue:

lima python3 -m http.server

telnet localhost 8000

"Connection closed by foreign host." after pressing the RET key :

SSH: once
gRPC: 3 times

The cause seems that the gRPC portfwd does not implement TCP half-close.

Jul 03 '25 08:07 AkihiroSuda

Implementing TCP half-close may require non-trivial changes to the TunnelMessage messages

I think we need to either find a fix for it for 1.1.2, or revert the default back to SSH.

If we are going to revert the default back to SSH again, we should probably never promote gRPC to the default again. The default mode table already looks quite clumsy: https://github.com/lima-vm/lima/blob/53d718628f519dc6702f99473c5de343ac46ce62/website/content/en/docs/config/port.md?plain=1#L14-L22

Jul 03 '25 09:07 AkihiroSuda

Hi,

I don’t think it is about half open connections — my error happens on http and the http spec does not allow one sides shutdowns — Beside this I would assume that the problem is something with TCP_DELAY.

If ssh can do it don’t give up

Meno

On Thu 3. Jul 2025 at 11:45, Akihiro Suda @.***> wrote:

AkihiroSuda left a comment (lima-vm/lima#3601) https://github.com/lima-vm/lima/issues/3601#issuecomment-3031611168

Implementing TCP half-close may require non-trivial changes to the TunnelMessage messages

I think we need to either find a fix for it for 1.1.2, or revert the default back to SSH.

If we are going to revert the default back to SSH again, we should probably never promote gRPC to the default again. The default mode table already looks quite clumsy:

https://github.com/lima-vm/lima/blob/53d718628f519dc6702f99473c5de343ac46ce62/website/content/en/docs/config/port.md?plain=1#L14-L22

— Reply to this email directly, view it on GitHub https://github.com/lima-vm/lima/issues/3601#issuecomment-3031611168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEWJWDEUZW2JD6ZGO4T2L3GT3UZAVCNFSM6AAAAAB6PXBSG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMZRGYYTCMJWHA . You are receiving this because you were mentioned.Message ID: @.***>

Jul 03 '25 16:07 mabels

We still have:

https://github.com/lima-vm/lima/issues/3685

Jul 04 '25 06:07 AkihiroSuda

lima
lima copied to clipboard

Stuck network when using multiple connections with k3s

Summary

Reproduction

What I discovered

Versions

lima lima copied to clipboard

Stuck network when using multiple connections with k3s

Summary

Reproduction

What I discovered

Versions

lima
lima copied to clipboard