containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[ecs/lambda/fargate] [request]: Allow unprivileged containers to create new user namespaces with clone(2) and unshare(2)

Open esamattis opened this issue 11 months ago • 2 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request

I'd like to create new unprivileged user namespace so I could use clone(2) to create sandboxed processes like nsjail, bubblewrap, isolate or even how Chromium does it.

Since Linux 3.8 it should possible to create them without any extra permissions. From the CLONE_NEWUSER section of the clone(2) man page:

Before Linux 3.8, use of CLONE_NEWUSER required that the caller have three capabilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID. Starting with Linux 3.8, no privileges are needed to create a user namespace.

Linux 3.8 was released in 2013 so I think it is pretty safe to assume that AWS is running newer kernels ;)

But when I try to create new user namespace with clone(2) it errors with EPERM. Tried this is in unprivileged ECS container and in a Lambda Container. The same code ran fine in a local linux installation when executed as non-root.

Which service(s) is this request for?

All container services: Lambda Containers, unprivileged ECS, Fargate etc.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

To run sandboxed processes which have no network access, does not see other process PIDs and have limited filesystem visibility.

Are you currently working around this issue?

I think I need to use privileged ECS containers. Have not tried them yet.

Additional context

Normally when a new user namespace is created with CLONE_NEWUSER it is possible to create bind mounts, use pivot root etc. without any extra permissions.

This could also allow running rootless Docker or Podman without privileged containers.

There is a great article series on Linux Namespaces on lwm.net: https://lwn.net/Articles/531114/

esamattis avatar Aug 03 '23 09:08 esamattis

I think I need to use privileged ECS containers. Have not tried them yet.

Update: Yes, with privileged containers it is possible to create new user namespaces with a non-root (non-zero uid) user. But that's kinda unfortunate that if you want to add extra sandboxing you'll need to first give more permissions.

esamattis avatar Aug 09 '23 08:08 esamattis

+1

heri16 avatar May 18 '24 01:05 heri16