singularity icon indicating copy to clipboard operation
singularity copied to clipboard

Nvidia GPU not detectable in TensorFlow container

Open SuhasSrinivasan opened this issue 6 months ago • 2 comments

Before you report an issue...

Version of Singularity

What version of Singularity are you using? Run:

$ singularity --version
singularity-ce version 4.3.1-noble

Describe the bug

Dear Developers,

Thank you for developing Singularity!

After following the official instructions for Singularity GPU support, with or without --nvccli, the GPU is not available to the guest environment.

The following Nvidia prerequisites have been installed

  1. Nvidia drivers
  2. Nvidia CUDA Toolkit
  3. Nvidia Container Toolkit

To Reproduce

$ singularity run --nv --nvccli tensorflow_latest-gpu.sif
INFO:    Setting 'NVIDIA_VISIBLE_DEVICES=all' to emulate legacy GPU binding.
INFO:    Setting --writable-tmpfs (required by nvidia-container-cli)

________                               _______________
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!

/sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
Singularity> /usr/bin/python
Python 3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2025-06-10 15:50:59.413177: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-10 15:50:59.708828: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1749595859.808923  290239 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749595859.831656  290239 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1749595860.065125  290239 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749595860.065149  290239 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749595860.065153  290239 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749595860.065156  290239 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-06-10 15:51:00.089035: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> print(tf.config.list_physical_devices('GPU'))
W0000 00:00:1749596006.672872  290239 gpu_device.cc:2341] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
Singularity> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

Singularity> nvidia-smi
Tue Jun 10 17:51:42 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:01:00.0 Off |                  Off |
| 30%   26C    P8             12W /  300W |      35MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2175      G   /usr/lib/xorg/Xorg                        9MiB |
|    0   N/A  N/A            2304      G   /usr/bin/gnome-shell                      5MiB |
+-----------------------------------------------------------------------------------------+

Nvidia Container CLI path has been configured in /etc/singularity/singularity.conf. The TensorFlow GPU detection issue is the same with or without the option: --nvccli.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0

$ nvidia-container-cli --version
cli-version: 1.17.8
lib-version: 1.17.8
build date: 2025-05-30T13:47+00:00
build revision: 6eda4d76c8c5f8fc174e4abca83e513fb4dd63b0
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

Expected behavior

Is there any other configuration that is necessary to make the GPU be available through Singularity to the container environment?

OS / Linux Distribution

Which Linux distribution are you using? Is it up-to-date?

$ cat /etc/os-release
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.2 LTS (Noble Numbat)"
VERSION_CODENAME=noble

Installation Method

GitHub debian binary

SuhasSrinivasan avatar Jun 11 '25 01:06 SuhasSrinivasan

This is a commonly reported issue in the tensorflow GitHub issues, and is not specific to Singularity. I would suggest seeking help in a tensorflow support community, and / or trying older versions of tensorflow.

dtrudg avatar Aug 26 '25 08:08 dtrudg

Hi @dtrudg ,

Thank you for reviewing and for the suggestion. Will follow up there as well.

SuhasSrinivasan avatar Aug 27 '25 17:08 SuhasSrinivasan