keras icon indicating copy to clipboard operation
keras copied to clipboard

Can get CUDA to work with torch but not with tensorflow

Open JuanVargas opened this issue 1 year ago • 10 comments
trafficstars

I am running Ubuntu 22.04.3 LTS. Python 3.10.12, GCC 11.4.0 The system has an NVIDIA GeForce RTX 3060 card, with Driver Version: 535.129.03; CUDA Version: 12.2

I installed keras-3.0.2 in two different virtual envs using as backends tensorflow-cuda and torch-cuda. In both cases I use the instructions given:

For tf-cuda

pip install -r requirements-tensorflow-cuda.txt python pip_build.py --install

For torch-cuda

pip install -r requirements-torch-cuda.txt python pip_build.py --install

When I test the torch-cuda, I get torch version = 2.1.1+cu118 which detects CUDA. So that version seems to be OK

But when I test the tf-cuda, I get

2023-12-30 07:34:38.417075: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-12-30 07:34:38.891617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

tf.test.gpu_device_name() #should get a name but return is empty tf.config.list_physical_devices(‘GPU’) # returns [ ]

Any help/suggestions to fix this issue are greatly appreciated. Thank you.

JuanVargas avatar Dec 30 '23 13:12 JuanVargas

@JuanVargas - Can you try in a new virtual env -

pip install torch # also install Cuda 12
pip install tensorflow # will download TF 2.15 that should use the same cuda 12
# pip install jax (if needed)

Then test TensorFlow?

sampathweb avatar Dec 31 '23 07:12 sampathweb

Hi Ranesh

I tried the steps you suggest under a new virtual env. As you said, the version of tf/keras installed is 2.15.0. The torch is 2.1.2+cu121, both of which were able to recognize the GPU in the system.

So it looks like tf/keras v 2.16 still needs some work. Thank you!

Juan

On Sun, Dec 31, 2023 at 2:48 AM Ramesh Sampath @.***> wrote:

@JuanVargas https://github.com/JuanVargas - Can you try in a new virtual env -

pip install torch # also install Cuda 12 pip install tensorflow # will download TF 2.15 that should use the same cuda 12

pip install jax (if needed)

Then test TensorFlow?

— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/19002#issuecomment-1872800600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK34PTYER5DHTKXEL2TADYMEKDPAVCNFSM6AAAAABBHQISG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZSHAYDANRQGA . You are receiving this because you were mentioned.Message ID: @.***>

JuanVargas avatar Dec 31 '23 19:12 JuanVargas

Hi @JuanVargas ,

I think the problem is due to the tf-nightly version pinned in requirements-tensorflow-cuda.txt for Keras 3.0.2 version.

https://github.com/keras-team/keras/blob/fe2f54aa5bc42fb23a96449cf90434ab9bb6a2cd/requirements-tensorflow-cuda.txt#L3

The above tf-nightly version shall be called by Keras version but it is failing to install CUDA package with the below error log.

ERROR: Could not find a version that satisfies the requirement tensorrt-libs==8.6.1; extra == "and-cuda" (from tf-nightly[and-cuda]) (from versions: 9.0.0.post11.dev1, 9.0.0.post12.dev1, 9.0.1.post11.dev4, 9.0.1.post12.dev4, 9.1.0.post11.dev4, 9.1.0.post12.dev4, 9.2.0.post11.dev5, 9.2.0.post12.dev5)
ERROR: No matching distribution found for tensorrt-libs==8.6.1; extra == "and-cuda"

This tf-nightly version already updated with latest working version in Keras Master. https://github.com/keras-team/keras/blob/ccc202a94bbcf02023b6d32ef05a4326eced6e69/requirements-tensorflow-cuda.txt#L3

If you try installing keras master may be this problem will not occur.

Thanks!

SuryanarayanaY avatar Jan 02 '24 06:01 SuryanarayanaY

It looks like this is the problem. Thank you so much for your feedback. Juan

On Tue, Jan 2, 2024 at 1:19 AM Surya @.***> wrote:

Hi @JuanVargas https://github.com/JuanVargas ,

I think the problem is due to the tf-nightly version pinned in requirements-tensorflow-cuda.txt for Keras 3.0.2 version.

https://github.com/keras-team/keras/blob/fe2f54aa5bc42fb23a96449cf90434ab9bb6a2cd/requirements-tensorflow-cuda.txt#L3

The above tf-nightly version shall be called by Keras version but it is failing to install CUDA package with the below error log.

ERROR: Could not find a version that satisfies the requirement tensorrt-libs==8.6.1; extra == "and-cuda" (from tf-nightly[and-cuda]) (from versions: 9.0.0.post11.dev1, 9.0.0.post12.dev1, 9.0.1.post11.dev4, 9.0.1.post12.dev4, 9.1.0.post11.dev4, 9.1.0.post12.dev4, 9.2.0.post11.dev5, 9.2.0.post12.dev5) ERROR: No matching distribution found for tensorrt-libs==8.6.1; extra == "and-cuda"

This tf-nightly version already updated with latest working version in Keras Master.

https://github.com/keras-team/keras/blob/ccc202a94bbcf02023b6d32ef05a4326eced6e69/requirements-tensorflow-cuda.txt#L3

If you try installing keras master may be this problem will not occur.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/19002#issuecomment-1873654698, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK34KVS7S5UEVKAVOMJSTYMORH3AVCNFSM6AAAAABBHQISG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZTGY2TINRZHA . You are receiving this because you were mentioned.Message ID: @.***>

JuanVargas avatar Jan 02 '24 16:01 JuanVargas

Hi @JuanVargas ,

Thanks for confirmation.

@sampathweb ,

Whether we need to cherry pick the above change(in master branch) to 3.0.2 ?

SuryanarayanaY avatar Jan 09 '24 05:01 SuryanarayanaY

Yes, I think that would nice and would help. The version of Tf/keras that recognizes the gpus is 2.15. I was hoping to get gpus work with keras 3.

On Tue, Jan 9, 2024, 00:25 Surya @.***> wrote:

Hi @JuanVargas https://github.com/JuanVargas ,

Thanks for confirmation.

@sampathweb https://github.com/sampathweb ,

Whether we need to cherry pick the above change https://github.com/keras-team/keras/issues/19002#issuecomment-1873654698(in master branch) to 3.0.2 ?

— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/19002#issuecomment-1882433279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK34JW5MCQ6AESDIYOO43YNTIFPAVCNFSM6AAAAABBHQISG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBSGQZTGMRXHE . You are receiving this because you were mentioned.Message ID: @.***>

JuanVargas avatar Jan 09 '24 16:01 JuanVargas

Hi

in a previous comment you suggest : " ... If you try installing keras master may be this problem will not occur...."

Could you please let me know how I may do that, so that I could use the GPUs and the CUDA API under version 2.16 ?

JuanVargas avatar Jan 16 '24 22:01 JuanVargas

I found that all I needed was to read more carefully your suggestion and edit the file requirements-tensorflow-cuda.txt to replace version. So I created a new virtual env and did the following :

edited requirements-tensorflow-cuda.txt to use 2.16.0-dev20240101

pip install -r requirements-tensorflow-cuda.txt

run python import tensorflow as tf

tf.version 2024-01-16 17:59:46.686812: tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-01-16 17:59:47.235591: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

tf.version '2.16.0-dev20240101' tf.config.list_physical_devices('GPU')** 2024-01-16 18:03:08.205240:

returns empy list [ ]

2024-01-16 18:03:08.246976: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices...

JuanVargas avatar Jan 16 '24 23:01 JuanVargas

@JuanVargas Hi, I also meet the same problem. Do you solve it?

lingluodlut avatar Jan 24 '24 10:01 lingluodlut

I could not solve the problem. I had to go back to version 2.15 :-(

On Wed, Jan 24, 2024 at 5:07 AM ling luo @.***> wrote:

@JuanVargas https://github.com/JuanVargas Hi, I also meet the same problem. Do you solve it?

— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/19002#issuecomment-1907807290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK34OHLMKRVCFAVHVIFDDYQDMM7AVCNFSM6AAAAABBHQISG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXHAYDOMRZGA . You are receiving this because you were mentioned.Message ID: @.***>

JuanVargas avatar Jan 24 '24 13:01 JuanVargas