ANTsRNet icon indicating copy to clipboard operation
ANTsRNet copied to clipboard

CUDA and tensorflow-gpu on WSL2

Open muratmaga opened this issue 5 years ago • 0 comments

This is not a bug, a comment.

It appears that it is now possible to run tensorflow-gpu with docker [1] or even without docker [2] through WSL2. This requires using the a specific version of Nvidia drivers on windows. I lost the link Nvidia provided for WSL2 specific drivers, but I am using 465.12, which is not even listed on the regular driver download page.

I tried to follow the instructions and experiment on using ubuntu 20.04 LTS. I got as far as this:

> library(tensorflow)
> tf$config$list_physical_devices("GPU")
2020-12-11 11:08:30.076717: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-12-11 11:08:32.100652: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/maga/.local/share/r-miniconda/envs/r-reticulate/lib:/usr/lib/R/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server
2020-12-11 11:08:32.100688: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-12-11 11:08:32.100701: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (Maga-XPS15): /proc/driver/nvidia/version does not exist

I receive a similar error on the docker side as well:


maga@Maga-XPS15:~$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
ERRO[0000] error waiting for container: context canceled

Instructions are flaky and I am not familiar with CUDA or tensorflow all that much to begin with. At this point I am not sure what to change or try. I brought this up, because if someone more knowledgeable figure out the instructions, this can greatly facilitate the use of whole ANTsR thing on windows. I am also happy to help test.

[1] https://docs.nvidia.com/cuda/wsl-user-guide/index.html [2] https://stackoverflow.com/questions/63679865/install-tensorflow-gpu-on-wsl2

muratmaga avatar Dec 11 '20 19:12 muratmaga