mirrord
mirrord copied to clipboard
TCP Steal/mirror doesn't work with shared sockets
Bug Description
When a parent process binds a socket, then children sockets listen/accept on that socket mirrord doesn't intercept those calls because the fd isn't managed (doesn't exist in our fork). Using this issue to document use cases where this happens for now:
- uvicorn when using
--reload
flag.
Steps to Reproduce
from fastapi import FastAPI
import sys
app = FastAPI()
@app.get("/")
async def root():
print("cake")
return {"message": "Hello World"}
@app.get("/hello/{name}")
async def say_hello(name: str):
print("pake")
return {"message": f"Hello {name}"}
run with mirrord exec --target pod/pod uvicorn -- --reload --port 80 main:app
Backtrace
No response
Relevant Logs
No response
Your operating system and version
macOS
Local process
python
Local process version
No response
Additional Info
No response
Possible solutions:
- hooking fork/exec, then passing down the
SOCKET/FILES
fd struct as environment variable for fds that aren't CLOEXEC - Use shared memory (shmem) for
SOCKET/FILES
(dunno about this..) - on
listen
with missing fd (this happens on uvicorn + reloader case) use getsockname to resolve addr and then decide if to add the socket and backfill the information. This is pretty good approach but has some caveats:
- It would work only on the listen case, not if listen is called in parent and accept in child (not sure if this happens and what happens in this case)
- the socket information we would get from getsockname would be the "fake port" and not the one the remote would need to listen for, so maybe this is impossible ;(
Please let us know if this is impacting anyone so we can know if we need to prioritize it.
This would probably affect me if I were not already blocked by this issue
I'm not blocked by this, but it does affect my productivity. MacOS M1 Pro
I'm not blocked by this, but it does affect my productivity.
MacOS M1 Pro
Hey, does it still happen to you on latest version?
I'm running 3.75.0
and I'm not getting traffic with --reload
and I do get traffic without it
Also doesn't work when using Uvicorn with more than one worker process.
Happening to me on 3.84.1. I am not using the operator. Not sure if that matters.
Ran into the same issue. Might be worth flagging this in the docs somewhere until its fixed
I've been exploring this, and here are a few solutions I can think of:
- :earth_americas: how can it be global, if the code is flat
The main issue we have is that our sockfd
s are stored in a per-layer global SOCKETS
, which are not being shared between forks. So the proposal here is to just move the SOCKETS
global to the intproxy
, and change every operation that calls SOCKETS.[get|take](&sockfd)
to use make_proxy_request_with_response
(or something similar).
A problem with this solution is that now every sockfd
becomes shared between exec
s and fork
s, which is not ideal. Another (minor) problem is that we must manually remember to call insert
after take
, as some operations call SOCKETS.take
to modify the socket, meh.
- :donkey: every masterpiece has its cheap copy
Instead of moving the whole SOCKETS
to intproxy
, whenever we detect a fork
(new layer connection?), we send a copy of the SOCKETS
minus the ones with CLOEXEC
to the intproxy
, which in turn sets the new layer connection's SOCKETS
to this curated copy.
- :peach: thicc intproxy
Delegate more work to the intproxy
, by moving the socket/ops
there, and creating some sort of MirrordSocket
API that keeps an association between a sockfd
and the pid
that created it, so we can avoid leaking sockfd
s.
That's it, so far. I'm leaning more towards the :donkey: solution, as :earth_americas: is a bug waiting to happen, and :peach: is too big of a pointless refactor.
🫏 is what would make most sense and is how it works in terms of OS level too.
We can actually pass all SOCKETS
downstream and iterate each fd and see if it's still open (so we'll just follow OS logic if cloexec or not)
Fixed in #2610
Looks like the problem is still there on macOS + vscode.