sionna icon indicating copy to clipboard operation
sionna copied to clipboard

No CUDA device found; using CPU as fallback.

Open Fedomer opened this issue 1 year ago • 9 comments

First use and just at the first line [1]: GPU Configuration and Imports in the tutorial Sionna_Ray_Tracing_Introduction was not found. No CUDA device found; using CPU as fallback.

but !nvidia-smi print:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:01:00.0 Off |                    0 |
| N/A   42C    P0             37W /  250W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off |   00000000:81:00.0 Off |                    0 |
| N/A   51C    P0             47W /  250W |       1MiB /  40960MiB |      5%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

I use a container with the docker image. Other Rapids docker images works fine. drivers pb????

Fedomer avatar Oct 04 '24 21:10 Fedomer

Hello @Fedomer,

Sionna uses Mitsuba for its ray tracing capabilities, which itself uses OptiX under the hood. For OptiX to be able to be loaded, the Docker container needs to enable its support. I am not a Docker expert, but I think that enabling the graphics driver capabilities should help: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#driver-capabilities

merlinND avatar Oct 07 '24 09:10 merlinND

Hello @merlinND , thank's you I did it. I've created my container using the tutorial: podman container create --name Sionna --device nvidia.com/gpu=all -it -p 8888:8888 --privileged=true --env NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility localhost/sionna:latest

NB: podman use the flags of docker and works fine for rapids images.

Fedomer avatar Oct 07 '24 11:10 Fedomer

Glad it worked!

merlinND avatar Oct 07 '24 13:10 merlinND

Hello @merlinND , I've done it but it did't work! I'm still investigating . I will try on a different hardware machine with different OS (Ubuntu 20.04, now I use RedHat enterprise 9.4 with podman)

the "No CUDA device found; " appears when I do : import sionna

Fedomer avatar Oct 07 '24 13:10 Fedomer

could you please run this inside the docker container and give us the result?

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

gmarcusm avatar Oct 07 '24 16:10 gmarcusm

Hi @gmarcusm thanks, # python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 2024-10-07 16:48:56.624726: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-07 16:48:56.624791: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-07 16:48:56.626089: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-07 16:48:56.632935: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

also with import sionna: `# python3
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import sionna 2024-10-07 16:59:29.563043: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-07 16:59:29.563161: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-07 16:59:29.564489: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-07 16:59:29.571596: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. No CUDA device found; using CPU as fallback.`

it seems that Tensorflow is not GPU enabled! but it's the official build with the dockerfile provided.

Fedomer avatar Oct 07 '24 16:10 Fedomer

Upgraded news

Docker container seems load fine sionna package (with cuda) in a computer with Ubuntu 20.04LTS and Nvidia A5000 card with driver: | NVIDIA-SMI 470.256.02 Driver Version: 470.256.02 CUDA Version: 12.3 | but have that strange issue in a GPU rack server with dual A100 GPU powered by RedHat enterprise 9.4 and podman as container engine. Driver in RH9.4 are: | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |

Other container with more recent tensorflow, Rapids works fine.

Still investigating......

** ... after some investigating** It's seeems that the problem belong to the container engine Podman. Using on different linux distribution with docker the environement works! Contacting Redhat for that issues is the next step.... still investigating.

Fedomer avatar Oct 08 '24 07:10 Fedomer

Hi @Fedomer did you solve this issue? Were you able to get it to work on Redhat Linux? I am facing the same problem. TF by itself is able to find a GPU but once I pip install sionna it is not able to find a GPU anymore. Not sure if Sionna downgrades the TF version and messes up things in the process.

csankar69 avatar Jan 09 '25 02:01 csankar69

Hello @csankar69 , for the moment I'm using Docker because I Think it's a problem about GPU podman management or a bad configuration for Sionna. I'm waiting a new Server wit redHat and I will try again. RedHat can't solve the problem.

Fedomer avatar Jan 09 '25 09:01 Fedomer

Closing due to inactivity

SebastianCa avatar Aug 21 '25 09:08 SebastianCa