mirrord TCP Steal/mirror doesn't work with shared sockets

Bug Description

When a parent process binds a socket, then children sockets listen/accept on that socket mirrord doesn't intercept those calls because the fd isn't managed (doesn't exist in our fork). Using this issue to document use cases where this happens for now:

uvicorn when using --reload flag.

Steps to Reproduce

from fastapi import FastAPI
import sys
app = FastAPI()


@app.get("/")
async def root():
    print("cake")
    return {"message": "Hello World"}


@app.get("/hello/{name}")
async def say_hello(name: str):
    print("pake")
    return {"message": f"Hello {name}"}

run with mirrord exec --target pod/pod uvicorn -- --reload --port 80 main:app

Backtrace

No response

Relevant Logs

No response

Your operating system and version

macOS

Local process

python

Local process version

No response

Additional Info

No response

Dec 28 '22 08:12 aviramha

Possible solutions:

hooking fork/exec, then passing down the SOCKET/FILES fd struct as environment variable for fds that aren't CLOEXEC
Use shared memory (shmem) for SOCKET/FILES (dunno about this..)
on listen with missing fd (this happens on uvicorn + reloader case) use getsockname to resolve addr and then decide if to add the socket and backfill the information. This is pretty good approach but has some caveats:

It would work only on the listen case, not if listen is called in parent and accept in child (not sure if this happens and what happens in this case)
the socket information we would get from getsockname would be the "fake port" and not the one the remote would need to listen for, so maybe this is impossible ;(

Dec 28 '22 08:12 aviramha

Please let us know if this is impacting anyone so we can know if we need to prioritize it.

Dec 28 '22 08:12 aviramha

This would probably affect me if I were not already blocked by this issue

Sep 10 '23 16:09 danielloader

I'm not blocked by this, but it does affect my productivity. MacOS M1 Pro

Nov 08 '23 15:11 yoav-orca

I'm not blocked by this, but it does affect my productivity.

MacOS M1 Pro

Hey, does it still happen to you on latest version?

Nov 08 '23 15:11 aviramha

I'm running 3.75.0 and I'm not getting traffic with --reload and I do get traffic without it

Nov 08 '23 15:11 yoav-orca

Also doesn't work when using Uvicorn with more than one worker process.

Jan 01 '24 11:01 aviramha

Happening to me on 3.84.1. I am not using the operator. Not sure if that matters.

Jan 22 '24 19:01 surbas

Ran into the same issue. Might be worth flagging this in the docs somewhere until its fixed

Jun 12 '24 19:06 brandfocus

I've been exploring this, and here are a few solutions I can think of:

:earth_americas: how can it be global, if the code is flat

The main issue we have is that our sockfds are stored in a per-layer global SOCKETS, which are not being shared between forks. So the proposal here is to just move the SOCKETS global to the intproxy, and change every operation that calls SOCKETS.[get|take](&sockfd) to use make_proxy_request_with_response (or something similar).

A problem with this solution is that now every sockfd becomes shared between execs and forks, which is not ideal. Another (minor) problem is that we must manually remember to call insert after take, as some operations call SOCKETS.take to modify the socket, meh.

:donkey: every masterpiece has its cheap copy

Instead of moving the whole SOCKETS to intproxy, whenever we detect a fork (new layer connection?), we send a copy of the SOCKETS minus the ones with CLOEXEC to the intproxy, which in turn sets the new layer connection's SOCKETS to this curated copy.

:peach: thicc intproxy

Delegate more work to the intproxy, by moving the socket/ops there, and creating some sort of MirrordSocket API that keeps an association between a sockfd and the pid that created it, so we can avoid leaking sockfds.

That's it, so far. I'm leaning more towards the :donkey: solution, as :earth_americas: is a bug waiting to happen, and :peach: is too big of a pointless refactor.

Jun 28 '24 22:06 meowjesty

🫏 is what would make most sense and is how it works in terms of OS level too. We can actually pass all SOCKETS downstream and iterate each fd and see if it's still open (so we'll just follow OS logic if cloexec or not)

Jun 29 '24 14:06 aviramha

Fixed in #2610

Jul 31 '24 07:07 aviramha

Looks like the problem is still there on macOS + vscode.

Aug 09 '24 15:08 meowjesty

mirrord mirrord copied to clipboard

TCP Steal/mirror doesn't work with shared sockets

Bug Description

Steps to Reproduce

Backtrace

Relevant Logs

Your operating system and version

Local process

Local process version

Additional Info

mirrord
mirrord copied to clipboard