Gemfield

Results 28 comments of Gemfield

With nvidia-container-toolkit == 1.3.0-1: ```bash gemfield@ai02:~$ cat /var/log/nvidia-container-toolkit.log -- WARNING, the following logs are for debugging purposes only -- I0629 12:23:01.770400 601381 nvc.c:282] initializing library context (version=1.3.0, build=16315ebdf4b9728e899f615e208b50c41d7a5d15) I0629 12:23:01.770499...

Seems the new version nvidia-container-toolkit has extra mounting: ```bash I0629 12:10:31.042580 930305 nvc_mount.c:112] mounting /var/lib/docker/overlay2/dd8d1c44a88df34c3257d7d6cc323c206a57a70abb108ebc389456002466b76b/merged/usr/local/cuda-11.3/compat/libcuda.so.465.19.01 at /var/lib/docker/overlay2/dd8d1c44a88df34c3257d7d6cc323c206a57a70abb108ebc389456002466b76b/merged/usr/lib/x86_64-linux-gnu/libcuda.so.465.19.01 I0629 12:10:31.042688 930305 nvc_mount.c:112] mounting /var/lib/docker/overlay2/dd8d1c44a88df34c3257d7d6cc323c206a57a70abb108ebc389456002466b76b/merged/usr/local/cuda-11.3/compat/libnvidia-ptxjitcompiler.so.465.19.01 at /var/lib/docker/overlay2/dd8d1c44a88df34c3257d7d6cc323c206a57a70abb108ebc389456002466b76b/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.465.19.01 ```

To reproduce this issue, I use two host machines(ai01, ai02) with different nvidia-container-toolkit version. And, both ai01 and ai02 have same OS, nvidia driver, and 1080ti cuda device. ## host...

@klueska are you saying ai02 container "worked" just because nvidia-container-toolkit == 1.3.0-1 accidentally contains the bug? And by "worked", it should throw another issue instead? But why below code works...

After further debugging(https://zhuanlan.zhihu.com/p/361545761), I found that: ## ai01 container: ```bash root@1d0d6b4ec38d:/.gemfield_install# ls -l /usr/lib/x86_64-linux-gnu/libcuda.so* lrwxrwxrwx 1 root root 12 6月 30 00:09 /usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1 lrwxrwxrwx 1 root root 20...

@klueska There has no cuda-compat packages on ai01 host OS(ubuntu20.04) ```bash gemfield@ai01:~$ dpkg -l | grep -i cuda | grep -i compat gemfield@ai01:~$ dpkg -l | grep -i nvidia |...

@klueska Thanks, I will have a try. Meantime, will this be fixed/enhanced in libnvidia-container next release?

@guillaumekln I think so. And I have already used same wa as solution of MLab HomePod project: https://github.com/DeepVAC/MLab/blob/6479b74dcb9fe3d598658f41f6f1c6dec7fd71a4/docker/homepod/Dockerfile.pro#L9