tf-keras New optimizers fail to load CUDA installed through conda

trafficstars

System information.

Have I written custom code (as opposed to using a stock example script provided in Keras): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04 (WSL)
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.11
Python version: 3.9
Bazel version (if compiling from source): N/A
GPU model and memory: RTX 2080 Ti
Exact command to reproduce:

Create a new environment, following the official installation instructions from here https://www.tensorflow.org/install/pip#linux:

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install tensorflow

Run the beginner MNIST tutorial (or any other tutorial that calls fit) from here https://keras.io/examples/vision/mnist_convnet/

Describe the problem.

An error is raised:

libdevice not found at ./libdevice.10.bc

Note that if you switch to using the legacy optimizers, by switching this line

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

to this

model.compile(loss="categorical_crossentropy", optimizer=keras.optimizers.legacy.Adam(), metrics=["accuracy"])

then the example runs successfully.

Describe the current behavior.

An error occurs when running the example.

Describe the expected behavior.

The example should run without error, as it does when using the legacy optimizers.

Do you want to contribute a PR? (yes/no): no

Standalone code to reproduce the issue.

https://keras.io/examples/vision/mnist_convnet/

Source code / logs.

Full stack trace of the error:

    File ".../tmp.py", line 47, in <module>
      model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_4'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1026]

Likely related to:

https://github.com/tensorflow/tensorflow/issues/56927
https://github.com/tensorflow/tensorflow/issues/59013

Jan 13 '23 16:01 drasmuss

@gowthamkpr, I tried to execute the mentioned code in two different ways as below, but couldn't find any issue. Kindly find the gist of it here. model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.SGD())

and

model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.legacy.SGD())

Jan 17 '23 09:01 tilakrayal

It doesn't look like your gist is following step 1 of the reproduction instructions above (i.e., create a new environment and install CUDA through conda).

Jan 17 '23 12:01 drasmuss

I have been encountering the same issue as @drasmuss with the non legacy opitmisers: "adam" and "rmsprop". No errors with the SGD optimizer though. Below is the error from trying to run my script with the "rmsprop" optimiser.

Node: 'StatefulPartitionedCall_8' libdevice not found at ./libdevice.10.bc [[{{node StatefulPartitionedCall_8}}]] [Op:__inference_train_function_1102]

Jan 18 '23 20:01 kevint0

Hi,

adding me me too here - hoping it adds value and not just noise :)

I'm also seeing this issue in the following setup:

CUDA 11.7 installed on SLES from RPM packages (via the official Nvidia rep)
cuDNN 8.5.0 installed from cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
Tensorflow 2.11 installed via pip

This was not an issue with Tensorflow 2.10. With 2.11, I now get:

libdevice not found at ./libdevice.10.bc

Feb 24 '23 10:02 mhaas

@drasmuss ,

I believe this is not an issue now. I have cross checked with legacy optimizer and execution is success.Please refer the attached logs below. Please confirm if this is still an issue now?

17422_logs.txt

Apr 26 '23 15:04 SuryanarayanaY

Just checked, and it produces the same error as before. Here are the reproduction steps (I updated the installation instructions to match the changes for TF 2.12 here https://www.tensorflow.org/install/pip#linux):

# these are the standard TF installation steps, copied here for clarity
conda install -c conda-forge cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

# calling model.fit triggers the same error as before
python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

Here is the full error log:

2023-04-26 12:33:40.887265: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-26 12:33:40.912502: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-26 12:33:41.303191: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-26 12:33:41.877110: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.892418: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.892768: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.894668: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.894937: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.895171: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.496930: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.497198: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.497225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1722] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2023-04-26 12:33:42.497474: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.497531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 8859 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
2023-04-26 12:33:43.559322: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x7fad47d31b00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-26 12:33:43.559369: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2023-04-26 12:33:43.562463: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-04-26 12:33:43.668017: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600
2023-04-26 12:33:43.673970: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:530] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.8
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-04-26 12:33:43.674114: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-04-26 12:33:43.674287: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-04-26 12:33:43.674323: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_1}}]]
2023-04-26 12:33:43.682794: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-04-26 12:33:43.682947: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File ".../lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall_1' defined at (most recent call last):
    File "<string>", line 1, in <module>
    File ".../lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1685, in fit
      tmp_logs = self.train_function(iterator)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1284, in train_function
      return step_function(self, iterator)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1268, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1249, in run_step
      outputs = model.train_step(data)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1054, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 543, in minimize
      self.apply_gradients(grads_and_vars)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1174, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 650, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1200, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1250, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1245, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_1'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_1}}]] [Op:__inference_train_function_401]

It's possible that you have CUDA installed elsewhere on your system (not through conda), and tensorflow is finding libdevice in that installation.

Apr 26 '23 15:04 drasmuss

Hi @drasmuss ,

Could you please try the following commands and let us know whether it fixes the error.

# Install NVCC
conda install -c nvidia cuda-nvcc=11.3.58
# Configure the XLA cuda directory
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
printf 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# Copy libdevice file to the required path
mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/

Thanks!

May 09 '23 04:05 SuryanarayanaY

Yes, that makes the problem go away, although I would hesitate to call it a solution as that's quite a cumbersome process to repeat every time we create a new environment, and a definite downgrade in user experience compared to TF <= 2.10.

May 09 '23 12:05 drasmuss

@chenmoneygithub do you know if this is a known issue?

May 18 '23 09:05 rchao

I ran into this same issue with WSL2 and the proposed fix did not initially work for me; however, it did eventually work if I rebooted my computer after updating the $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh file.

I had similar issues with the published pip install instructions for Tensorflow (https://www.tensorflow.org/install/pip#step-by-step_instructions) and had to reboot my system between several of the steps.

Hopefully this helps anyone running into this issue on WSL2

Sep 15 '23 14:09 danieljwiest

I had this problem as well and the fix above worked.

Oct 25 '23 05:10 joaomamede

I ran into the same problem on Linux Mint victoria 21.2 x86_64 after creating a new environment with conda and installing tensorflow-gpu version 2.12.1 from the conda-forge channel. As suggested by @SuryanarayanaY I used his attempt but without specifying the cuda-ncc version (it installed 12.3.52) and it worked. Thank you very much again for the solution, @SuryanarayanaY !

conda install -c nvidia cuda-nvcc
# Configure the XLA cuda directory
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
printf 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# Copy libdevice file to the required path
mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/

Oct 26 '23 13:10 Datagniel

Same issue on Fedora 39 with fresh tensorflow from conda-forge, SuryanarayanaY fix works

Nov 12 '23 13:11 LogExE

Same issue, any official fix yet?

Jan 30 '24 11:01 makra89

tf-keras tf-keras copied to clipboard

New optimizers fail to load CUDA installed through conda

tf-keras
tf-keras copied to clipboard