coreos-nvidia
coreos-nvidia copied to clipboard
nvidia-docker v2?
Hi, Are there plans to use nvidia-docker v2 (now merged into master: new official version) ?
It is simpler to use: https://github.com/NVIDIA/nvidia-docker/wiki/About-version-2.0
Above links are broken. I guess it's because 2.0 branch was merged into master recently by means of https://github.com/NVIDIA/nvidia-docker/commit/fe1874942b896df074ca1b5b819bc6a2ca9e8151
@rporres indeed, I updated my comment.
Its requires any changes? The current version was done for bare docker, not even nvidia-docker 1.0
using nvidia-docker v2 would simplify the docker run part: no need to add:
--volumes-from nvidia-driver \
--env PATH=$PATH:/opt/nvidia/bin/ \
--env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/lib \
$(for d in /dev/nvidia*; do echo -n "--device $d "; done) \
So what change is required is in fact installing nvidia-docker v2 in coreos, and removing the nvidia-driver container.
I used the following steps to install nvidia-docker v2 (very hacky though):
- install nvidia driver
- instead of the volume I simply copy the files to the host, e.g.
/usr/bin/docker run --rm --volume /opt/nvidia/current:/output srcd/coreos-nvidia:${VERSION} cp -a /opt/nvidia/. /output/
- install libnvidia-container
- (build and) install nvidia-container-runtime
- create small bash scripts in
/run/torcx/binfornvidia-container{-runtime,-runtime-hook,-cli}to make sure they are accessible by docker and libraries are inLD_LIBRARY_PATH - create
/etc/docker/daemon.jsonand set default runtime to nvidia - restart docker
- add the
nvidia-dockerbash scripts
There is only one issue currently: The nvidia-container-runtime somehow (even though same commit as installed runc) has a regression. And fails to run containers with docker run --security-opt=no-new-privileges (https://github.com/coreos/bugs/issues/1796).
We have it working as well (nvidia-docker v2 + coreos + k8s device plugin). We will try to clean it up and hopefully be able to share it soonish.
went for this instead https://github.com/GoogleCloudPlatform/container-engine-accelerators/pull/54
@lsjostro I would be interested in having your previous "nvidia-docker v2 + coreos" version, even if not cleaned up and production-ready: nvidia-docker v2 enables sharing GPUs between containers (at the cost of losing k8s scheduling) that device drivers solutions don't support (and won't for the foreseeable future).
In any case, https://github.com/GoogleCloudPlatform/container-engine-accelerators/pull/54 is useful too, thanks for that !