Really need a complete guide to build gdrcopy in the official nvidia docker container
If just follow the instructions in README.md and you'll find a lot of things that don't work, I tried for hours without success.
My docker images is nvidia/cuda:11.8.0-devel-ubuntu22.04
After a whole night of hard trying, I finally got a feasible steps to share with you
- Starting a gdrcopy-capable container
KERNEL_VERSION=`uname -r`
docker run --gpus=all -it -v /lib/modules/${KERNEL_VERSION}:/lib/modules/${KERNEL_VERSION} -v /usr/src/linux-headers-${KERNEL_VERSION%-*}:/usr/src/linux-headers-${KERNEL_VERSION%-*} -v /usr/src/linux-headers-${KERNEL_VERSION}:/usr/src/linux-headers-${KERNEL_VERSION} --privileged nvidia/cuda:11.8.0-devel-ubuntu22.04
- Compile, install and load the kernel module inside a container
apt update --fix-missing
apt install git check pkg-config nvidia-dkms-<your-nvidia-driver-version>
git clone https://github.com/NVIDIA/gdrcopy
cd gdrcopy
make prefix=/ all install
./insmod.sh
- Try if it works
sanity
Hi @goooxu,
We don't have an official document about installing GDRCopy in Docker. But I can provide some guideline here.
GDRCopy composes of two important modules: 1) gdrdrv driver, and 2) libgdrapi. The driver needs to be install in the host (outside the container). After the installation, you should see /dev/gdrdrv on the host. You should mount /dev/gdrdrv into your container. Then, please check inside your container that you see /dev/gdrdrv and it links to /dev/gdrdrv on the host.
For libgdrapi, you can install it in your container only -- no need to install libgdrapi on the host.
The steps you provide work as well. You should be able to install gdrdrv as you run the container in the privilege mode. But you may want to avoid doing this in shared environments.