hwe icon indicating copy to clipboard operation
hwe copied to clipboard

python tensorflow in nvidia-enabled tumbleweed and fedora distroboxes unable to talk to GPU

Open alexispurslane opened this issue 2 months ago • 11 comments

Symptoms

Whether I create an ephemeral fedora rawhide or 39 distrobox with --nvidia, or use the tumbleweed distrobox I created from a distrobox-assemble with nvidia=true, and whether I create a python venv and then pip install tensorflow[and-cuda] or just do pip install --break-system-packages tensorflow[and-cuda] publicly, when installing those packages afresh, I get this output when trying to use tensorflow with my gpu:

$ python3
Python 3.12.2 (main, Feb 21 2024, 00:00:00) [GCC 14.0.1 20240217 (Red Hat 14.0.1-0)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-04-12 14:29:54.089365: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-12 14:29:54.126693: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-12 14:29:54.786606: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tf.config.list_logical_devices()
2024-04-12 14:29:57.714446: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-12 14:29:57.714953: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[LogicalDevice(name='/device:CPU:0', device_type='CPU')]

Steps to reproduce

  1. Create a distrobox with nvidia enablement, either tumbleweed or fedora (and probably others)
  2. install tensorflow with cuda
  3. run import tensorflow as tf; tf.config.list_logical_devices()
  4. Observe results

alexispurslane avatar Apr 12 '24 18:04 alexispurslane