handson-ml3
handson-ml3 copied to clipboard
[BUG]tensorflow cannot find GPU with "docker-compose up", but it works with "docker run". nvidia-smi shows GPU with cuda version error.
Thanks for helping us improve this project!
Before you create this issue Please make sure you are using the latest updated code and libraries: see https://github.com/ageron/handson-ml3/blob/main/INSTALL.md#update-this-project-and-its-libraries
Also please make sure to read the FAQ (https://github.com/ageron/handson-ml3#faq) and search for existing issues (both open and closed), as your question may already have been answered: https://github.com/ageron/handson-ml3/issues
Describe the bug tensorflow cannot find GPU with "docker-compose up", but it works with "docker run". To Reproduce make run make exec ipython
Then I get these:
In [1]: import tensorflow as tf
2023-04-30 20:38:26.804632: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Librar
y (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-30 20:38:27.019561: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plug
in cuBLAS when one has already been registered
In [2]: tf.config.list_physical_devices('GPU')
2023-04-30 20:38:30.584296: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (34)
2023-04-30 20:38:30.584411: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 3b0aa2bc90a6
2023-04-30 20:38:30.584443: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 3b0aa2bc90a6
2023-04-30 20:38:30.584559: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.s
o DSO loaded into this program
2023-04-30 20:38:30.584654: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 530.30.2
Out[2]: []
The GPU is exposed to the container and can be verified by nvidia-smi. But here it also shows CUDA version error. +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: ERR! | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:81:00.0 Off | N/A | | 30% 36C P8 25W / 350W| 6MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| +---------------------------------------------------------------------------------------+
This is my docker-compose.yml
version: "3"
services:
handson-ml3:
build:
context: ../
dockerfile: ./docker/Dockerfile.gpu #Dockerfile
args:
- username=devel
- userid=1000
container_name: handson-ml3
image: ageron/handson-ml3:latest-gpu #latest
restart: unless-stopped
logging:
driver: json-file
options:
max-size: 50m
ports:
- "8888:8888"
- "8890:8890"
- "6006:6006"
volumes:
- ../:/home/devel/handson-ml3
command: /opt/conda/envs/homl3/bin/jupyter-lab --ip=0.0.0.0 --port=8890 --no-browser
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu, utility]