swarmkit icon indicating copy to clipboard operation
swarmkit copied to clipboard

Add support for devices with "service create"

Open flx42 opened this issue 9 years ago • 81 comments

Initially reported: https://github.com/docker/docker/issues/24865, but I realized it actually belongs here. Feel free to close the other one if you want. Content of the original issue copied below.

Related: #1030

Currently, it's not possible to add devices with docker service create, there is no equivalent for docker run --device=/dev/foo.

I'm an author of nvidia-docker with @3XX0 and we need to add devices files (the GPUs) and volumes to the starting containers in order to enable GPU apps as services. See the discussion here: https://github.com/docker/docker/issues/23917#issuecomment-233670078 (summarized below).

We figured out how to add a volume provided by a volume plugin:

$ docker service create --mount type=volume,source=nvidia_driver_367.35,target=/usr/local/nvidia,volume-driver=nvidia-docker [...]

But there is no solution for devices, @cpuguy83 and @justincormack suggested using --mount type=bind. But it doesn't seem to work, it's probably like doing a mknod but without the proper device cgroup whitelisting.

$ docker service create --mount type=bind,source=/dev/nvidiactl,target=/dev/nvidiactl ubuntu:14.04 sh -c 'echo foo > /dev/nvidiactl'
$ docker logs stupefied_kilby.1.2445ld28x6ooo0rjns26ezsfg
sh: 1: cannot create /dev/nvidiactl: Operation not permitted

It's probably equivalent to this:

$ docker run -ti ubuntu:14.04                      
root@76d4bb08b07c:/# mknod -m 666 /dev/nvidiactl c 195 255
root@76d4bb08b07c:/# echo foo > /dev/nvidiactl
bash: /dev/nvidiactl: Operation not permitted

Whereas the following works (invalid arg is normal, but no permission error):

$ docker run -ti --device /dev/nvidiactl ubuntu:14.04
root@ea53a1b96226:/# echo foo > /dev/nvidiactl
bash: echo: write error: Invalid argument

flx42 avatar Jul 26 '16 22:07 flx42

@flx42 For the container runtime, devices require special handling (a mknod syscall), so mounts won't work. We'll probably have to add some sort of support for this. (cc @crosbymichael)

Ideally, we'd like to be able to schedule over devices, as well.

stevvooe avatar Jul 26 '16 22:07 stevvooe

@stevvooe Already have device support in the runtime, just not exposed in swarm.

cpuguy83 avatar Jul 26 '16 22:07 cpuguy83

Ideally, we'd like to be able to schedule over devices, as well.

This question was raised here: https://github.com/docker/docker/issues/24750 But the discussion was redirected here: https://github.com/docker/docker/issues/23917, in order to have a single discussion thread.

flx42 avatar Jul 26 '16 22:07 flx42

@stevvooe I quickly hacked a solution, it's not too difficult: https://github.com/flx42/swarmkit/commit/a82b9fb2b1f3387baa1e4d4447ba9af4f3e05f16 This is not a PR yet, would you be interested if I do one? Or are the swarmkit features frozen right now before 1.12? The next step would be to also modify the engine API.

flx42 avatar Jul 28 '16 00:07 flx42

Forgot to mention that I can now run GPU containers by mimicking what nvidia-docker does:

./bin/swarmctl service create --device /dev/nvidia-uvm --device /dev/nvidiactl --device /dev/nvidia0 --bind /var/lib/nvidia-docker/volumes/nvidia_driver/367.35:/usr/local/nvidia --image nvidia/digits:4.0 --name digits

flx42 avatar Jul 28 '16 00:07 flx42

@flx42 I took a quick peak and the PR looks like a decent start. I am not sure about representing these as cluster-level resources for container startup. From an orchestration perspective, we have to match these up with announced resources at the node level, which might be okay. It might be better on ContainerSpec, but I'm not sure yet.

Go ahead and file as a [WIP] PR.

stevvooe avatar Jul 28 '16 00:07 stevvooe

@stevvooe Yeah, that's the biggest discussion point for sure.

In engine-api, devices are resources: https://github.com/docker/engine-api/blob/master/types/container/host_config.go#L249

But in swarmkit, resources are so far "fungible" objects like CPU shares and memory, with a base value and a limit. A device doesn't really fit that definition. For GPU apps we have devices that must be shared (/dev/nvidiactl) and devices that could be exclusively acquired (like /dev/nvidia0).

I decided to initially put devices into resources because there is already a function in swarmkit that creates a engine-api Resource object from a swarm Resource object: https://github.com/docker/swarmkit/blob/master/agent/exec/container/container.go#L301-L324 This method would also need to access the container spec.

I will file a PR soon to continue the discussion.

flx42 avatar Jul 28 '16 00:07 flx42

@flx42 Great!

We really aren't planning on following the same resource model from HostConfig for SwarmKit. In this case, we are instructing the container to mount these devices, which is specific to a container runtime. Other runtimes may not have a container or devices. Thus, I would err on ContainerSpec.

Now, I would like to see scheduling of fungible GPUs but that might a wholly separate flow, keeping the initial support narrow. Such services would require manual constraint and device assignment, but you still achieve the goal.

Let's discuss this in the context of the PR.

stevvooe avatar Jul 28 '16 01:07 stevvooe

Thanks @flx42 - I think GPU is definitly something we want to support medium term.

/cc @mgoelzer

aluzzardi avatar Aug 05 '16 01:08 aluzzardi

Thanks @aluzzardi, PR created, it's quite basic.

flx42 avatar Aug 10 '16 01:08 flx42

The --device option is really import for my use case too. I am trying to use swarm to manage 50 Raspberry Pi's to do computer vision, but I need to be able to access /dev/video0 to capture images. Without this option, I'm stuck, and have to manage them without swarm, which is painful.

mlhales avatar Dec 27 '16 04:12 mlhales

@mlhales We need someone who is willing to workout the issues with --device in a clustered environment and support that solution, rather than just a drive by PR. If you or a colleague want to take this on, that would be great, but this isn't as simple as adding --device.

stevvooe avatar Jan 06 '17 22:01 stevvooe

Using --device=/dev/gpiomem would be great on a RPi swarm to access GPIO on each node without privileged mode.

StefanScherer avatar Feb 15 '17 22:02 StefanScherer

Using --device=/dev/fuse would be great for mounting FUSE, which isn't currently possible.

nazar-pc avatar Feb 20 '17 13:02 nazar-pc

We found an easier way for Blinkt! LED strip to use sysfs. Now we can run Blinkt! in docker swarm mode without privileges.

StefanScherer avatar Feb 20 '17 13:02 StefanScherer

@StefanScherer is it a proper alternative for using e.g. --device=/dev/mem to access GPIO on a RPi ? Would love to see an example if you would care to share :)

mathiasimmer avatar Feb 21 '17 09:02 mathiasimmer

@mathiasimmer For the use-case with Blinkt! LED strip there are only eight RGB LED's. So using sysfs it not time critical for these few LED's. If you want to drive hundreds of them you still need faster GPIO access to have a higher clock rate. But for Blinkt! we have forked the Node.js module and adjusted in in this branch https://github.com/sealsystems/node-blinkt/tree/sysfs. A sample application can be found as well and how to use this forked module as dependency in an own package.json.

StefanScherer avatar Feb 21 '17 09:02 StefanScherer

/cc @cyli

aluzzardi avatar Feb 22 '17 19:02 aluzzardi

@aluzzardi I think we should resurrect the --device patch. I don't think there is anything in the pipeline that is sophisticated enough to handle proper, cluster-level dynamic resource allocation. Looking back at this issue, there isn't necessarily a model that will work well in all cases (mostly because no one here can seem to enumerate them).

We can always add logic in the scheduler to prevent device contention in the future.

stevvooe avatar Feb 22 '17 19:02 stevvooe

Attempt to add devices to the container spec and plugin spec here: https://github.com/docker/swarmkit/pull/1964

I've no objection to the --device flag - cc @diogomonica ?

cyli avatar Feb 22 '17 23:02 cyli

--device allows any service to escalate privileges. Why would we add this w/out profiles on services?

diogomonica avatar Feb 23 '17 03:02 diogomonica

@diogomonica I thought profiles mainly covered capabilities, etc?

cyli avatar Feb 23 '17 03:02 cyli

@cyli well, if we believe "devices" are easy enough to understand for easy user acceptance then we might not need them, but we should look critically at adding anything that allows escalation of privileges of a container to the cmd-line before we have agood way of informing everything the service will need from a security perspective to the user.

diogomonica avatar Feb 23 '17 04:02 diogomonica

Also following this. Very interested in access to character devices (/dev/bus/usb/...) in a docker swarm. To help some others until this is supported by docker, a workaround for swarm + usb:

  1. On the (linux) host(s), create a udev rule which creates a symlink to your device (in my case an ftdi device). e.g. /etc/udev/rules.d/99-libftdi.rules SUBSYSTEMS=="usb", ATTRS{idVendor}=="xxxx", ATTRS{idProduct}=="xxxx", GROUP="dialout", MODE="0666", SYMLINK+="my_ftdi", RUN+="/usr/bin/setupdockerusb.sh" Then reload udev rules: sudo udevadm control --reload-rules Upon connect of the usb device, the udev manager will create a symlink /dev/my_ftdi -> /dev/bus/usb/xxx/xxx and execute /usr/bin/setupdockerusb.sh

  2. The /usr/bin/setupdockerusb.sh (ref) This script sets the character device permissions on (the first) container with given image name.

#!/bin/bash
USBDEV=`readlink -f /dev/my_ftdi`
read minor major < <(stat -c '%T %t' $USBDEV)
if [[ -z $minor || -z $major ]]; then
    echo 'Device not found'
    exit
fi
dminor=$((0x${minor}))
dmajor=$((0x${major}))
CID=`docker ps --no-trunc -q --filter ancestor=my/imagename|head -1`
if [[ -z $CID ]]; then
    echo 'CID not found'
    exit
fi
echo 'Setting permissions'
echo "c $dmajor:$dminor rwm" > /sys/fs/cgroup/devices/docker/$CID/devices.allow
  1. Create the docker swarm with following options: docker service create [...] --mount type=bind,source=/dev/bus/usb,target=/dev/bus/usb [...]

  2. Event listener (systemd service): Waits for a container to be started and sets permissions. Run with root permissions on host.

#!/bin/bash
docker events --filter 'event=start'| \
while read line; do
    /usr/bin/setupdockerusb.sh
done

brubbel avatar Mar 12 '17 10:03 brubbel

will be great to add --devices in swarm service

mort1k avatar Apr 26 '17 18:04 mort1k

@flx42 , Can you let us know, if your patch is available for latest docker swarm, i.e. if someone has ported your patch to the latest, docker swarm API 1.24+, where swarmkit is integrated within docker daemon.

sudharkrish avatar Oct 05 '17 21:10 sudharkrish

@sudharkrish No, it isn't ported AFAIK.

flx42 avatar Oct 05 '17 23:10 flx42

@flx42 what is the current state of this? :)

eyJhb avatar May 15 '18 18:05 eyJhb

Wondering about the current state as well.. I was trying to set up a simple at home swarm environment (so I could manage with a simple yaml file and a docker stack deploy) and was dissapointed to find --device was missing from swarm mode, keeping me from being able to mount my raspberry pi camera via swarm.

Cinderhaze avatar May 22 '18 04:05 Cinderhaze

Adding my use case, my company is deploying IoT sensors, and without support for --device equivalent swarm mode can't be used

vim-zz avatar May 22 '18 06:05 vim-zz