buildkit Support `RUN --gpus ...` or equivalent.

COPY --from=https://github.com/docker/buildx/issues/239 ./ ./

Hi and thanks for this awesome tool!

We make heavy use of buildkit (soon buildx) in Tensorflow Addons to run our tests. It's awesome to be able to run all those tests in parralel in isolated environements, even on remote servers. With the docker cache it's awesome, no need to pull the dependencies again... etc.

Something that we're stuck with though is that we can't run our gpu tests in a dockerfile (like we do for all the other tests).

Would it be possible to support gpus for an entire stage? Or even better for a single RUN instruction?

Thanks a lot!

Apr 10 '20 08:04 gabrieldemarmiesse

cc @tiborvass

Apr 10 '20 20:04 tonistiigi

How about the privileged frontend in buildkit? That should provide you with a privileged build step that has access to all devices, including the GPUs.

Apr 23 '20 08:04 kniec

@kniec @tonistiigi @tiborvass I want to extend the original #1800 issue. In my case, I'm trying to build libraries with CUDA in docker on Jetson Nano/Xavier boards based on their official images. By default, they provide CUDA access only in runtime. But it could be also enabled for the build process as well by adding a default nvdia runtime in docker daemon settings. As far as I understood, this option activates CUDA libs' auto-mounting into containers while building. And it perfectly works until I enable buildkit feature. When it comes to steps that require CUDA libraries, it fails to locate them. Assuming such nvidia runtime specifics, would privileged or insecure options still work, or buildkit completely overrides the default runtime?

Feb 05 '21 12:02 sskorol

@sskorol Buildkit vendored into Docker will only use runc runtime always atm. If you replace the binary it may work with other runtimes as well but this behavior is not recommended as it means the same dockerfile might behave differently on separate machines. #1800 does not assume any custom runtime dependency. You can either just access/initialize devices directly in the insecure mode in runc or we build in some initialization helpers like https://github.com/docker/cli/pull/1714 .

Feb 06 '21 05:02 tonistiigi

Can this be triggered via frontend?

Feb 09 '21 09:02 kniec

Found that it wants the libs from the win-side:

LD_LIBRARY_PATH=/usr/lib/wsl/lib/ python3 -c "import torch; print(torch.cuda.is_available())"

Sep 12 '22 05:09 gregruthenbeck

Is there no progress on this after a month? This makes custom CUDA acceleration operators for PyTorch in WSL2 Docker impossible.

The footprint this leaves in the Deep Learning use case is large, I think users just one more reason to leave Microsoft and use Linux for Deep Learning engineering

Oct 12 '22 07:10 kognat-docs

I'm slightly confused as to the scope of this issue - sorry, I'm not familiar with machine learning pipelines, or anything like that, so forgive my naivety!

It seems like RUN --gpus as a feature would behave similarly to docker run --gpus - exposing the GPUs into the runtime? As far as I can tell, this works in moby through containerd with cli support just a simple component on top of that. Buildkit doesn't use moby for spinning up containers for RUN commands and similar, we support using either oci or containerd (see buildkitd.toml) - I'm not sure what the status of GPU support is through oci runc? Regardless, this feels a little tricky to do, but a patch to do this would involve similar logic to what moby is already doing - or we could pull that in maybe? Not sure.

The other thing I've seen mentioned is overriding the runtime with nvidia-container-runtime. Apparently this is OCI-compliant, and compatible with runc? If that's the case, you should be able to modify buildkitd.toml worker.oci.binary on an external buildkit instance - that would have the same effect as setting it in docker's daemon config. Would this resolve the issue? Or is some additional logic needed on top of that to expose GPUs, and a RUN --gpus flag would be required to do that?

Oct 13 '22 09:10 jedevc

I can do a test by adding RUN --gpus=all for now and see if this solves the issues we were having. It is frustrating that the documents like

https://docs.nvidia.com/cuda/wsl-user-guide/index.html

https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/

Do not mention the lack of support for GPUs at build time is unclear on WSL2 Linux yet on other distributions the support is via the nvidia runtime specified globally at the system level.

If RUN --gpus ... is the supported mechanism that is ubiquitous then it needs to be documented as such so users on WSL2 can get expected behavior when using GPU devices at build time.

Oct 13 '22 18:10 kognat-docs

Any updates？Still need to use DOCKER_BUILDKIT=0 to disable buildx to build GPU related images today?

Mar 06 '24 02:03 yusiwen

Any updates？Still need to use DOCKER_BUILDKIT=0 to disable buildx to build GPU related images today?

I found myself in this use case myself again today

Mar 06 '24 08:03 samhodge

I have a workaround for this. It's not pretty and has some limitations, but works. h/t to @cpuguy83 for the suggestion! Created a repo with details: https://github.com/sozercan/buildkit-nvidia

This is the gist of how the fine tuning implementation for https://github.com/sozercan/aikit works with docker build --output with NVIDIA GPUs today. For more info: https://sozercan.github.io/aikit/fine-tune

@jedevc i tried the worker.oci.binary way by creating a custom builder but unfortunately I could not get this to work. Any more pointers would be appreciated!

Mar 25 '24 19:03 sozercan

Are there any maintainers backing this feature? It seems really strange to have to turn off buildkit to use gpus.

Apr 29 '24 22:04 anthonyalayo

Hi @anthonyalayo, yes, this is definitely on the roadmap. We are working on a planned proposal, which we hope to share soon. Is your use case covered by the details mentioned above, or are there additional aspects you would look for in this feature?

Apr 30 '24 09:04 colinhemmings

@colinhemmings I just got @sozercan 's workaround working after some tweaks to it. I'll definitely be using it in the meantime, but not having to run in insecure mode would be ideal. I'm looking forward to the planned proposal, thanks!

Apr 30 '24 09:04 anthonyalayo

buildkit buildkit copied to clipboard

Support `RUN --gpus ...` or equivalent.

buildkit
buildkit copied to clipboard