containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[ECS] [Fargate Task Storage]: Allow permission configuration of Fargate bind mounts

Open Alex-Richman opened this issue 4 years ago • 26 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem: Ephemeral storage for Fargate tasks with readonlyRootFilesystem and a non-root user.

On Fargate platform 1.3.0 this was achievable (in an undocumented/unintentional manner) by configuring a docker local volume at the task level and mounting it to /tmp in each service:

      volumes:
        - name: "tmpfs"
          dockerVolumeConfiguration:
            scope: "task"
            driver: "local"

...

          mountPoints:
            - sourceVolume: "tmpfs"
              containerPath: "/tmp"

(resulting in a world-writable tmp directory mounted to /tmp/ within the container)

On Fargate platform 1.4.0 docker local volumes are completely unavailable, and the new (officially recommended) way of implementing ephemeral storage for Fargate tasks is using a bind mount [1].

The problem with using a bind mount is that ECS mounts it as writable only by root, so a container running as a non-root user is unable to write any temporary files. Having the container run as root is generally undesirable for security reasons, though practically I expect the impact on ECS is limited since a root-based container escape would just dump an attacker into the ECS host which is presumably heavily sandboxed.

The ideal solution would be for ECS to support configuring permissions on bind mounts, or better still support tmpfs on Fargate [2][3].

[1] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-task-storage.html [2] https://github.com/aws/containers-roadmap/issues/736 [3] https://github.com/aws/containers-roadmap/issues/710

Alex-Richman avatar Jun 10 '20 16:06 Alex-Richman

Hello!

To run a container an image as non-root, a customer can do the following:

They can export the path that they want to export as a VOLUME and run chown in their own Dockerfile or ImageFile.

As an example, Let’s consider an image that has node as the base image (node will use nodejs environment) and it wants to use /var/log/exported in node:node as the user and group respectively. Now, they can specify VOLUME directive to /var/log/exported and then all the permissions will be reflected in their task volumes.

To understand how this can be achieved, let us look at the following Dockerfile.

FROM node:12-slim ## A node js base image
RUN chown node:node /var/log/exported ## Changing permissions from root to node
VOLUME ["/var/log/exported"] ## Specifying a VOLUME directive applies the permission

Please let us know if it works for you. Thank you

manugupt1 avatar Dec 12 '20 02:12 manugupt1

RUN chown node:node /var/log/exported ## Changing permissions from root to node VOLUME ["/var/log/exported"] ## Specifying a VOLUME directive applies the permission

This worked fine till Platform Version 1.3.0 but fails with 1.4.0 Now when 1.4.0 will be the new default, need a new workaround

tarun-wadhwa-mmt avatar Feb 19 '21 12:02 tarun-wadhwa-mmt

I crashed a stack on that issue, something is really needed here. As I had this issue I was not using version 1.4.0. I didn't fill PlatformVersion and got 1.3.0 previously by default, now that changed to 1.4.0, which still have this issue.

jpradelle avatar Mar 12 '21 10:03 jpradelle

I ended up using an EFS volume

gokhanoner avatar Mar 16 '21 23:03 gokhanoner

Hello,

We have recently updated our documentation with some examples on how to use bind-mounts. These examples include:

  1. Getting an empty data-volume for one or more containers.
  2. To expose a path and its contents from a Dockerfile to a container.
  3. To run a particular data-volume in a non-root environment. The link to our updated documentation can be found here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bind-mounts.html#bind-mount-examples

If this does not solve your use-case, please reach out to AWS Support.

Thanks Manu

manugupt1 avatar Mar 23 '21 22:03 manugupt1

I've looked closely at the Dockerfile my application uses, and compared it to the examples. While there are differences, as far as I can tell they are semantically equivalent, yet do not work with Fargate bind mounts. The process is ultimately unable to interact with the mounted volume.

Has anyone actually be able to make this work?

mattmassicotte avatar Mar 25 '21 16:03 mattmassicotte

Hi Matt and Gökhan,

Can you please open a support case, so that we can work with you to understand why this does not work?

Thanks Manu

manugupt1 avatar Mar 25 '21 19:03 manugupt1

@manugupt1 I'm afraid not, as technical support isn't covered by my current service level. However, I can share the Dockerfile with you, in case you want to inspect the differences.

https://github.com/apache/druid/blob/753bce324bdf8c7c5b2b602f89c720749bfa6e22/distribution/docker/Dockerfile

mattmassicotte avatar Mar 26 '21 14:03 mattmassicotte

Hi Matt, I looked through your Dockerfile and found that a symlink is the present for VOLUME directive. Ref: https://github.com/apache/druid/blob/753bce324bdf8c7c5b2b602f89c720749bfa6e22/distribution/docker/Dockerfile#L38

We have updated our documentation to say that the VOLUME directive should map to an absolute path. This can be found at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bind-mounts.html#bind-mount-considerations

Thanks Manu

manugupt1 avatar Apr 02 '21 15:04 manugupt1

@manugupt1 that was really awesome of you. Thanks so much for digging in. I'll bring this up with the dockerfile maintainers.

mattmassicotte avatar Apr 02 '21 16:04 mattmassicotte

To me, this is still a very valid request.

We have an ecs service based on 3 public Bitnami containers. We want to exchange volumes between these containers one for config one for data. Al these containers run as the same non-root user. Yet in fargate we cant run this setup because we can not manipulate the uid/gid of the named volume. In this case, we have no control over the docker files so we can expose a volume that way.

jellevanhees avatar Jun 03 '21 07:06 jellevanhees

Hello, Thanks for reaching out. Currently, this can be achieved by using an init-container. A task definition will require a non-essential container to be run before all other containers using dependsOn clause. This container will set the appropriate permissions for the images running in the non-root uid. Once it scopes down the permission, it will exit and start the other containers allowing you to write into these volumes.

A simple scaffolding of the task def can be as follows:

{
  ...
  "containerDefinitions": [
    ...
    {
      "name": "permissions-init",
      "image": "busybox:latest", # Can be any image that has chmod / chown
      "entryPoint": [
        "sh",
        "-c"
      ],
      "command": [
        "sudo chmod 0777 /example-vol" # one option
        "sudo chown 1000:1000 /example-vol" # another option
      ],
      "mountPoints": [
        {
          "containerPath": "/example-vol",
          "sourceVolume": "example-vol"
        }
      ],  
      "essential": false # Required
    },
    {
      "name": "essential-container",
      "image": "busybox:latest",
      "dependsOn" : [
          "containerName": "permissions-init",
          "condition": "SUCCESS" # or COMPLETE, depending on the use-case
      ],
    }
    ...
  ],
  "volumes": [
    {
      "name": "example-vol",
    }
  ],
  ...
}

manugupt1 avatar Jun 25 '21 00:06 manugupt1

I ended up slogging through this a bit and ended up with something similar to @manugupt1's example except converted to pull a container from our internal ECR (which is allowed) instead of Docker Hub (which is firewalled and also not reliable due to rate-limiting) and with logging enabled.

Looking at our containers, I think there are two things which would make it a lot easier to enable read-only root filesystems. One would be removing the need for permissions-changing containers by allowing the service to specify those permissions as part of the bind mount configuration:

"volumes": [
{
    "sourceVolume": "java-tmp",
    "containerPath": "/var/run/search",
    "user": "search-service",
    "group": "5555",
    "mode": "0755"
}
]

The other would be more complicated but it would be really nice if there was a way to specify an overlay mount over an existing mount point. We have a handful of things where something creates new files next to the distributed source at start up (e.g. Python creating pyc / __pycache__ (yes, I know about compileall but don't want to patch hundreds of containers), or a Java program compiling extensions on startup) and it would be handy for those applications if I could mount an overlay on top of them until we can get the developers to redesign them to use separate storage.

acdha avatar Mar 28 '23 21:03 acdha

I consider this a bug, because it works with ECS/EC2 AND with EFS!

As in, in both ECS/EC2 and with EFS, the mount point is "magically" owned by the non-privileged user the task is started as.

However, in ECS/Fargate, the mount point is owned by root:root, making it useless. ESPECIALLY with the 0755 mode!

EITHER use the containerDefinition.user value for the mount point (won't allow mode to be set, but 2775 is what WE want, which I think is a resonable [default] value here) OR allow for additional config as exampled by acdha above. Although, having separate user and group options might be overkill considering that containerDefinition.user already exists. In that example, only mode would be needed. AND possibly additional mount option perhaps..

Using the VOLUME option when creating the container would require a rebuild of the image (with additional overhead in testing and QA - EVEN THOUGH IT'S ONLY A MINOR CHANGE!!) when switching from EC2 to Fargate - we're in the process of changing ALL (or most?) of all our services over to Fargate to try to cut some cost.

FransUrbo avatar Jul 09 '23 09:07 FransUrbo

I see this issue is "Coming Soon" on the roadmap. Is there any ETA for fix? Trying to figure out whether or not we need to identify an alternative solution or wait for the fix.

nschoenbaechler avatar Jul 19 '23 18:07 nschoenbaechler

The workaround isn't to difficult. Setup a VOLUME in the Dockerfile with the path of the directory. Create it, chown and chmod it first..

You'll get double mounts in the container, but it turns out that the "real" one, the one with the correct user:group ownership and mode is the "last" one.

I couldn't wait, so we'll go with the workaround. For now.

FransUrbo avatar Jul 19 '23 20:07 FransUrbo

Hi, I found a better solution. Once the container runs and fails, check the cloudwatch logs of the execution. you will see what aws ecs rest api uses as group for the mountpoint, in my case it was 985 or something like this. I have added the group in my dockerfile. After that my user (non root required) was able to access the mountpoint.

RazvanGherlea avatar Aug 08 '23 12:08 RazvanGherlea

Did that some how change the mode of the mount point? It's 0755 on mine (without that group), which means that group don't have access to [write to] it anyway.. But does it (by adding the group) change the mode as well?

FransUrbo avatar Aug 08 '23 12:08 FransUrbo

Yes, I got the docker socket mounted and faced the same issue as described here. After adding the group it worked without double mounts.

In my dockerfile:

RUN groupadd -o -g 994 dockersock
RUN usermod -aG dockersock "runner"

RazvanGherlea avatar Aug 08 '23 13:08 RazvanGherlea

"Docker socket mounted".. Is this on ECS/Fargate really? How do you get to the socket, it's serverles architecture..

FransUrbo avatar Aug 08 '23 14:08 FransUrbo

To be honest it's ECS but not fargate. On the other hand, just to clarify that "Serverless" is a misnomer in the sense that servers are still used by cloud service providers to execute code for developers. The docker mount point was just an example of how you can identify the mount permission of the ECS container and fix it in your Dockerfile. Have you tried looking in the container logs after adding a command block to list the mounts and grep your mount point to see what permissions it requires ?

command = [
            "/bin/mount|grep "your_mount_point"
          ]

It is at this point your container gets to perform the mount not when you build your Dockerfile :)

RazvanGherlea avatar Aug 09 '23 07:08 RazvanGherlea

Ok, yeah. That's the thing about this ticket - ECS works correctly, but Fargate does not!

They SHOULD (imo!) work exactly the same, but they don't :(. Please read the WHOLE ticket and you'll see the problem.

The "serverless" is the correct word for this. And Lambda. WE, as users, don't have any access to "servers". That EVERYTHING, including R53 zones, SNS topics, AI etc and everything else, MUST run on a server "somewhere" is besides the point. It's serverless - FOR US!

FransUrbo avatar Aug 09 '23 09:08 FransUrbo

I'm also tracking this issue, as I've had to resort to workarounds due to its impact. It's surprising to see that such a significant problem still remains unresolved after more than three years.

garysassano avatar Oct 10 '23 22:10 garysassano

Hey there, I'm from the Fargate team. Apologies for the confusion on this issue. The below is an example of how to configure bind mounts owned by non-root users on Fargate PV 1.4 without the need for an init container. Please let us know if there are still gaps in what you are trying to achieve when using the below.

$ cat Dockerfile
FROM public.ecr.aws/amazonlinux/amazonlinux:2
 
RUN yum install -y shadow-utils && yum clean all
RUN useradd node
RUN mkdir -p /var/log/exported && chown node:node /var/log/exported
USER node
RUN touch /var/log/exported/examplefile
VOLUME ["/var/log/exported"]
 
$ cat taskdef.json
{
  "containerDefinitions": [
    {
      "name": "c1",
      "mountPoints": [
        {
          "containerPath": "/var/log/exported",
          "sourceVolume": "myvol"
        }
      ],
      "command": [
        "sh",
        "-c",
        "whoami; ls -l /var/log/exported; touch /var/log/exported/c1.txt; for i in 1 2 3 4 5; do ls -l /var/log/exported; sleep 2; done"
      ],
      ...
    },
      "name": "c2",
      "mountPoints": [
        {
          "containerPath": "/var/log/exported",
          "sourceVolume": "myvol"
        }
      ],
      "command": [
        "sh",
        "-c",
        "whoami; ls -l /var/log/exported; touch /var/log/exported/c2.txt; for i in 1 2 3 4 5; do ls -l /var/log/exported; sleep 2; done"
      ],
      ...
    }
  ],
  "volumes": [
    {
      "name": "myvol"
    }
  ],
  ...
}

The logs from container 1

$ whoami
node
 
$ ls -l /var/log/exported
-rw-r--r-- 1 node node 0 Sep 20 20:06 examplefile
 
$ touch c1.txt
 
$ ls -l /var/log/exported
-rw-r--r-- 1 node node 0 Sep 20 21:48 c1.txt
-rw-r--r-- 1 node node 0 Sep 20 20:06 examplefile
 
# After c2 starts we see the c2.txt file.
$ ls -l / var/log/exported
-rw-r--r-- 1 node node 0 Sep 20 21:48 c1.txt
-rw-r--r-- 1 node node 0 Sep 20 21:48 c2.txt
-rw-r--r-- 1 node node 0 Sep 20 20:06 examplefile

The logs from container 2

$ whoami
node
 
$ ls -l /var/log/exported
-rw-r--r-- 1 node node 0 Sep 20 20:06 examplefile
-rw-r--r-- 1 node node 0 Sep 20 21:48 c1.txt
 
$ touch c2.txt
 
$ ls -l /var/log/exported
-rw-r--r-- 1 node node 0 Sep 20 21:48 c1.txt
-rw-r--r-- 1 node node 0 Sep 20 21:48 c2.txt
-rw-r--r-- 1 node node 0 Sep 20 20:06 examplefile

alexcmms avatar Nov 02 '23 20:11 alexcmms

Hey there, I'm from the Fargate team. Apologies for the confusion on this issue. The below is an example of how to configure bind mounts owned by non-root users on Fargate PV 1.4 without the need for an init container. Please let us know if there are still gaps in what you are trying to achieve when using the below.

I just hit an example of that deploying a third-party container on ECS where I don't have an easy way to override their container definition without forking the Dockerfile to add something like your chown step before it changes users. Right now that means I do something like this (from the deployment of the OWASP DependencyTrack app I was just working on):

    {
        "name": "set-mount-permissions",
        "image": "public.ecr.aws/amazonlinux/amazonlinux:2023",
        "essential": false,
        "mountPoints": [
            {
                "containerPath": "/data",
                "sourceVolume": "dependency-track"
            }
        ],
        "command": [
            "install -d -o 1000 -g 1000 -m 0775 /data/.dependency-track"
        ]
    },
    {
        "dependsOn": [
            {
                "containerName": "set-mount-permissions",
                "condition": "SUCCESS"
            }
        ],
        "name": "DependencyTrack",
…

It would be convenient if instead I could just do something like this:

        "mountPoints": [
            {
                "containerPath": "/data",
                "sourceVolume": "dependency-track",
                "readOnly": false,
                "permissions": {
                    "owner": "1000",
                    "group": "1000",
                    "mode": "775"
                }

            }
        ],

or, even better, have some kind of magic value which could be used to say “the same UID/GID as the container's USER”:

        "mountPoints": [
            {
                "containerPath": "/data",
                "sourceVolume": "dependency-track",
                "readOnly": false,
                "permissions": {
                    "owner": "USER (or maybe CONTAINER-USER?)",
                    "group": "USER",
                    "mode": "775"
                }
            }
        ],

acdha avatar Nov 08 '23 19:11 acdha

It would be great if ECS Fargate made the owner/permission configurable for container mounts for bind mount volumes, or at least made the default value more useful (world-writable e.g., similar to Kubernetes emptyDir volumes).

We are using the image volume (Dockerfile VOLUME) workaround to deploy read-only-root-filesystem non-root containers in ECS Fargate 1.4.0 with a writable tmpdir filesystem.

But we'd prefer to keep this configuration in the deployment where it belongs. Baking the image volume into our application images means that we're forced to use that volume if we want to deploy the image in other contexts (Kubernetes e.g.)

jdoylei avatar Jan 31 '24 22:01 jdoylei