tf-keras icon indicating copy to clipboard operation
tf-keras copied to clipboard

New optimizers fail to load CUDA installed through conda

Open drasmuss opened this issue 2 years ago • 14 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04 (WSL)
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.11
  • Python version: 3.9
  • Bazel version (if compiling from source): N/A
  • GPU model and memory: RTX 2080 Ti
  • Exact command to reproduce:
  1. Create a new environment, following the official installation instructions from here https://www.tensorflow.org/install/pip#linux:
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install tensorflow
  1. Run the beginner MNIST tutorial (or any other tutorial that calls fit) from here https://keras.io/examples/vision/mnist_convnet/

Describe the problem.

An error is raised:

libdevice not found at ./libdevice.10.bc

Note that if you switch to using the legacy optimizers, by switching this line

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

to this

model.compile(loss="categorical_crossentropy", optimizer=keras.optimizers.legacy.Adam(), metrics=["accuracy"])

then the example runs successfully.

Describe the current behavior.

An error occurs when running the example.

Describe the expected behavior.

The example should run without error, as it does when using the legacy optimizers.

  • Do you want to contribute a PR? (yes/no): no

Standalone code to reproduce the issue.

https://keras.io/examples/vision/mnist_convnet/

Source code / logs.

Full stack trace of the error:

    File ".../tmp.py", line 47, in <module>
      model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_4'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1026]

Likely related to:

  • https://github.com/tensorflow/tensorflow/issues/56927
  • https://github.com/tensorflow/tensorflow/issues/59013

drasmuss avatar Jan 13 '23 16:01 drasmuss