buildkit
buildkit copied to clipboard
Support `RUN --gpus ...` or equivalent.
COPY --from=https://github.com/docker/buildx/issues/239 ./ ./
Hi and thanks for this awesome tool!
We make heavy use of buildkit (soon buildx) in Tensorflow Addons to run our tests. It's awesome to be able to run all those tests in parralel in isolated environements, even on remote servers. With the docker cache it's awesome, no need to pull the dependencies again... etc.
Something that we're stuck with though is that we can't run our gpu tests in a dockerfile (like we do for all the other tests).
Would it be possible to support gpus for an entire stage? Or even better for a single RUN instruction?
Thanks a lot!
cc @tiborvass
How about the privileged frontend in buildkit? That should provide you with a privileged build step that has access to all devices, including the GPUs.
@kniec @tonistiigi @tiborvass I want to extend the original #1800 issue. In my case, I'm trying to build libraries with CUDA in docker on Jetson Nano/Xavier boards based on their official images. By default, they provide CUDA access only in runtime. But it could be also enabled for the build process as well by adding a default nvdia runtime in docker daemon settings. As far as I understood, this option activates CUDA libs' auto-mounting into containers while building. And it perfectly works until I enable buildkit feature. When it comes to steps that require CUDA libraries, it fails to locate them. Assuming such nvidia runtime specifics, would privileged
or insecure
options still work, or buildkit completely overrides the default runtime?
@sskorol Buildkit vendored into Docker will only use runc
runtime always atm. If you replace the binary it may work with other runtimes as well but this behavior is not recommended as it means the same dockerfile might behave differently on separate machines. #1800 does not assume any custom runtime dependency. You can either just access/initialize devices directly in the insecure mode in runc or we build in some initialization helpers like https://github.com/docker/cli/pull/1714 .
Can this be triggered via frontend?
Found that it wants the libs from the win-side:
LD_LIBRARY_PATH=/usr/lib/wsl/lib/ python3 -c "import torch; print(torch.cuda.is_available())"
Is there no progress on this after a month? This makes custom CUDA acceleration operators for PyTorch in WSL2 Docker impossible.
The footprint this leaves in the Deep Learning use case is large, I think users just one more reason to leave Microsoft and use Linux for Deep Learning engineering
I'm slightly confused as to the scope of this issue - sorry, I'm not familiar with machine learning pipelines, or anything like that, so forgive my naivety!
It seems like RUN --gpus
as a feature would behave similarly to docker run --gpus
- exposing the GPUs into the runtime? As far as I can tell, this works in moby through containerd with cli support just a simple component on top of that. Buildkit doesn't use moby for spinning up containers for RUN
commands and similar, we support using either oci
or containerd
(see buildkitd.toml) - I'm not sure what the status of GPU support is through oci
runc
? Regardless, this feels a little tricky to do, but a patch to do this would involve similar logic to what moby
is already doing - or we could pull that in maybe? Not sure.
The other thing I've seen mentioned is overriding the runtime with nvidia-container-runtime
. Apparently this is OCI-compliant, and compatible with runc
? If that's the case, you should be able to modify buildkitd.toml worker.oci.binary
on an external buildkit instance - that would have the same effect as setting it in docker's daemon config. Would this resolve the issue? Or is some additional logic needed on top of that to expose GPUs, and a RUN --gpus
flag would be required to do that?
I can do a test by adding RUN --gpus=all for now and see if this solves the issues we were having. It is frustrating that the documents like
https://docs.nvidia.com/cuda/wsl-user-guide/index.html
https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/
Do not mention the lack of support for GPUs at build time is unclear on WSL2 Linux yet on other distributions the support is via the nvidia runtime specified globally at the system level.
If RUN --gpus ... is the supported mechanism that is ubiquitous then it needs to be documented as such so users on WSL2 can get expected behavior when using GPU devices at build time.
Any updates?Still need to use DOCKER_BUILDKIT=0
to disable buildx to build GPU related images today?
Any updates?Still need to use
DOCKER_BUILDKIT=0
to disable buildx to build GPU related images today?
I found myself in this use case myself again today
I have a workaround for this. It's not pretty and has some limitations, but works. h/t to @cpuguy83 for the suggestion! Created a repo with details: https://github.com/sozercan/buildkit-nvidia
This is the gist of how the fine tuning implementation for https://github.com/sozercan/aikit works with docker build --output
with NVIDIA GPUs today. For more info: https://sozercan.github.io/aikit/fine-tune
@jedevc i tried the worker.oci.binary
way by creating a custom builder but unfortunately I could not get this to work. Any more pointers would be appreciated!
Are there any maintainers backing this feature? It seems really strange to have to turn off buildkit to use gpus.
Hi @anthonyalayo, yes, this is definitely on the roadmap. We are working on a planned proposal, which we hope to share soon. Is your use case covered by the details mentioned above, or are there additional aspects you would look for in this feature?
@colinhemmings I just got @sozercan 's workaround working after some tweaks to it. I'll definitely be using it in the meantime, but not having to run in insecure mode would be ideal. I'm looking forward to the planned proposal, thanks!