enterprise_gateway icon indicating copy to clipboard operation
enterprise_gateway copied to clipboard

Possible solution to the first FIXME in the docker-compose.yml file.

Open buhl opened this issue 5 years ago • 7 comments

https://github.com/jupyter/enterprise_gateway/blob/02a7e0a1e59821b521f72b2f5ac56f21619a6cee/etc/docker/docker-compose.yml#L7 This problem might be solved by creating an entrypoint script like this one I use for the same problem on an alpine linux image

#!/bin/ash
set -e

GNAME=$(stat -c %G /var/run/docker.sock)
if [[ "$GNAME" != "UNKNOWN" ]]; then
    addgroup user $GNAME;
else
    GID=$(stat -c %g /var/run/docker.sock)
    addgroup -g $GID user;
fi
if [[ -z "$@" ]]; then
    su - user -c ash
else
    su - user -c "$@"
fi

That gives the user access to the docker socket

localhost:~$ docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock dtest:latest  ash
6600c48b7a97:~$ docker ps
CONTAINER ID        IMAGE                   COMMAND             CREATED             STATUS              PORTS                    NAMES
6600c48b7a97        dtest:latest            "/entry.sh ash"     4 seconds ago       Up 3 seconds                                 priceless_colden
6600c48b7a97:~$ id
uid=100(user) gid=101(user) groups=101(user),101(user),999(ping)
6600c48b7a97:~$ 

Heres my test Dockerfile

FROM alpine:latest
RUN apk update
RUN apk add docker-cli
RUN addgroup -S user && adduser -s /bin/ash -S user -G user
ADD entry.sh /
ENTRYPOINT ["/entry.sh"]
CMD ["while true; do id; sleep 2; done"]

If I misunderstood the problem or in any other way missed something I do apologies.

buhl avatar Jan 19 '20 10:01 buhl

This looks promising. Would you like to contribute a pull request for this?

I would recommend just going after the ID directly, rather than name first, and we'd probably want protection from the file not existing in the first place.

Thank you for opening this issue!

kevin-bates avatar Jan 20 '20 16:01 kevin-bates

Hi @kevin-bates Sorry for the late reply, I just returned from a vacation. I will try to get some time to fit this solution into a working example for you to look at.

buhl avatar Jan 30 '20 21:01 buhl

Right on - no worries. Welcome back!

kevin-bates avatar Jan 30 '20 22:01 kevin-bates

Hi @kevin-bates I made some changes to https://github.com/buhl/enterprise_gateway/blob/master/etc/docker/enterprise-gateway/Dockerfile and https://github.com/buhl/enterprise_gateway/blob/master/etc/docker/enterprise-gateway/start-enterprise-gateway.sh So now the jovyan user is added to the docker group. I have spent the most of two evenings trying to get the enterprise gateway to build and run and I am not all there. I can now start an enterprise gateway with docker-compose up, but I cant seem to get the enterprise gateway to work (I get {"reason": "Not Found", "message": ""} on all requests). However the jovyan user can talk to the docker daemon as demonstrated below:

enterprise_gateway/etc/docker $ docker-compose exec enterprise-gateway  /bin/bash
root@ab7012645bbe:/usr/local/bin# su - jovyan
jovyan@ab7012645bbe:~$ id
uid=1000(jovyan) gid=100(users) groups=100(users),999(docker)
jovyan@ab7012645bbe:~$ ps wwwfaux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
[...]
root         1  0.0  0.0   4520   792 ?        Ss   20:42   0:00 tini -g -- /usr/local/bin/start-enterprise-gateway.sh
root         6  0.0  0.0  11588  3132 ?        S    20:42   0:00 /bin/bash /usr/local/bin/start-enterprise-gateway.sh
root        18  0.0  0.0  50940  3452 ?        S    20:42   0:00  \_ su --preserve-environment jovyan -c /opt/conda/bin/jupyter enterprisegateway .--log-level=DEBUG .--MappingKernelManager.cull_idle_timeout=3600 .--MappingKernelManager.cull_interval=60 .--MappingKernelManager.cull_connected=False
jovyan      19  0.1  0.2  73988 48756 ?        Ss   20:42   0:01      \_ /opt/conda/bin/python /opt/conda/bin/jupyter-enterprisegateway --log-level=DEBUG --MappingKernelManager.cull_idle_timeout=3600 --MappingKernelManager.cull_interval=60 --MappingKernelManager.cull_connected=False
jovyan@ab7012645bbe:~$ curl --unix-socket /var/run/docker.sock http://localhost/containers/json
[{"Id":"ab7012645bbee25a3dfa432050b4b96e1d7e0db80619b6cee7715e7909abd596","Names":["/docker_enterprise-gateway_1"],"Image":"elyra/enterprise-gateway:dev","ImageID":"sha256:4d5551596ca09de98fa6372f613a0ed30b9201f1b2c42f3bd8f8de1c348aa8af","Command":"tini -g -- /usr/local/bin/start-enterprise-gateway.sh","Created":1581972148,"Ports":[{"IP":"0.0.0.0","PrivatePort":8888,"PublicPort":8888,"Type":"tcp"}],"Labels":{"app":"enterprise-gateway","com.docker.compose.config-hash":"82055d9f2565b2df47f89c910fefe89c84321473a215b6b755b92b4ed638108f","com.docker.compose.container-number":"1","com.docker.compose.oneoff":"False","com.docker.compose.project":"docker","com.docker.compose.service":"enterprise-gateway","com.docker.compose.version":"1.24.1","component":"enterprise-gateway","maintainer":"Jupyter Project <[email protected]>"},"State":"running","Status":"Up 16 minutes","HostConfig":{"NetworkMode":"docker_enterprise-gateway"},"NetworkSettings":{"Networks":{"docker_enterprise-gateway":{"IPAMConfig":null,"Links":null,"Aliases":null,"NetworkID":"3b4760bd7fd203154baeca0fcfdd71ea048f1b6a9c5fe870a31842ac09188e67","EndpointID":"f6fad93642d0bf5ff39812d90a3234f3693c70df5be4ae6b52d575746bff0620","Gateway":"172.20.0.1","IPAddress":"172.20.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:14:00:02","DriverOpts":null}}},"Mounts":[{"Type":"bind","Source":"/var/run/docker.sock","Destination":"/var/run/docker.sock","Mode":"rw","RW":true,"Propagation":"rprivate"}]}]
jovyan@ab7012645bbe:~$

I feel I have to say that running the service as a not root user in the container. but giving it access to the docker.sock is effectively like giving the jovyan user root on the host machine :) So if this exercise is only about dropping root for no other reason but dropping root it might not be super important.

Well, I don't really know what to do next?

buhl avatar Feb 17 '20 21:02 buhl

Hi @buhl, sorry for the frustration. I agree, EG is has a non-trivial build.

You do not need to worry about building the demo-base or enterprise-gateway-demo images. Those are purely for demo and YARN integration tests.

The make targets you'll need to invoke are: clean dist enterprise-gateway - the last of which builds the EG image. Target kernel-images builds the various kernel-related images, but those shouldn't have to change for this.

I agree with what you say about root and docker.sock. The requirement for operating in docker environments is that EG be able to start images, query running containers based on labels, etc. (discovery) and stop containers via the docker API. My understanding is that "docker in docker" requires docker.sock and mounting docker.sock requires root. So this may just need to be the way things are. Perhaps we just change the FIXME to a nasty WARNING message. :smile:

At any rate, I'm hoping we can get your build working so you're free to check things out and make contributions.

Regarding runtime experiences... What command are you issuing to produce {"reason": "Not Found", "message": ""}? I usually hit /api/kernelspecs as my litmus test that EG is able to service requests. You should get the JSON for each of the found kernelspecs returned.

If you're going through Notebook, then things could be tied up with incorrect socket in your --gateway-url value or something like that. Does the EG log show anything on each request attempt?

kevin-bates avatar Feb 17 '20 22:02 kevin-bates

Hi @kevin-bates Great, thanks! I will try with the make targets later this week. I actually got enterprise-gateway to build and run. The /api/kernelspecs enpoint you suggested also seem to work, but I have yet to try and start a notebook. I will try to clean up my branch, revert the unnecessary changes and attempt to submit a pull request.

I had a problem I didn't know how to solve so I had to remove the --KernelSpecManager.whitelist=${EG_KERNEL_WHITELIST} from the jupyter enterprisegateway initialization because a got the error traitlets.traitlets.TraitError: The 'whitelist' trait of a KernelSpecManager instance must be a set, but a value of class 'str' (i.e. '[r_docker,python_docker,python_tf_docker,scala_docker,spark_r_docker,spark_python_docker,spark_scala_docker]') was specified.

buhl avatar Feb 17 '20 23:02 buhl

ok - yeah, set-based traitlets can be difficult to get their values configured correctly. Looking at the appropriate files, and comparing them to other systems, I believe each of the items must be single-quoted - all of which are in square brackets. Here are a couple of examples that should work:

https://github.com/jupyter/enterprise_gateway/blob/master/etc/kubernetes/enterprise-gateway.yaml#L141 https://github.com/jupyter/enterprise_gateway/blob/master/etc/docker/enterprise-gateway/start-enterprise-gateway.sh#L28

Were you trying to setup EG_KERNEL_WHITELIST with your own set of values? Or are things getting modified before their actual use?

kevin-bates avatar Feb 17 '20 23:02 kevin-bates