bypass4netns icon indicating copy to clipboard operation
bypass4netns copied to clipboard

Use `SECCOMP_ADDFD_FLAG_SEND`

Open AkihiroSuda opened this issue 3 years ago • 3 comments

To inject it at socket(2) time safely, though, we need to use SECCOMP_ADDFD_FLAG_SEND in the addfd call. I added that flag to the kernel due to a race condition you can easily hit otherwise: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.17-rc2&id=0ae71c7720e3ae3aabd2e8a072d27f7bd173d25c.

Originally posted by @rata in https://github.com/rootless-containers/bypass4netns/issues/1#issuecomment-1027948113

AkihiroSuda avatar Feb 16 '22 07:02 AkihiroSuda

Are you planning to switch to the socket syscall, as I suggested, then? Let me know if you have any doubts about the flag or if I can help :)

rata avatar Feb 16 '22 15:02 rata

Thanks, but I guess we should just try adapting the current code to use SECCOMP_ADDFD_FLAG_SEND first, and then try hooking socket(2)

AkihiroSuda avatar Feb 18 '22 07:02 AkihiroSuda

I don't think it is needed to use the flag now. The upstream kernel commit says connect(2) when it should say socket(2). As I explained in the comment you linked here, if you use the "newfd" field when issuing the addfd ioctl, this race won't be a problem. It will be a problem if you handle socket, not connect.

The thing is, if the container received EINTR between the agent did the addfd and before it answered the syscall, it will be retried. If the agent does the addfd again without setting the newfd, then a new fd will be allocated. This can happen several times and the container end up with N fds, instead of just 1. But if you always use the "newfd" number, then even if you inject the fd several times, you close the old one (it has the same fd number, that is what addfd does if newfd is currently in use) and therefore there is no leak :)

rata avatar Feb 18 '22 11:02 rata