Dear Authors:
I want to know whether this code support GPU trainning? I tried install tensorflow-gpu with latest version and also 0.12.1 version. I will get errors tensor shapes doesn't match. I want to know whether you got the same error and how to fix it? Thanks
Well, our code should put your data automatically on your GPU....
What kind of errors do you get? I can not help you, if you are not providing the exact error you are encountering ;)
Hi, Bartzi:
Thanks for your reply. I installed tensorflow-gpu==1.12.0, I got the following error:
Logging to logs/2018-11-19-17-25-30
Traceback (most recent call last):
File "train.py", line 88, in
model_file_name = train(cli_args, log_dir)
File "train.py", line 38, in train
model = model_class.create_model(train_data_generator.get_input_shape(), config)
File "/home/pu.song/Documents/ASRDev/LID/crnn-lid/keras/models/topcoder_crnn_finetune.py", line 54, in create_model
model.add(Bidirectional(LSTM(512, return_sequences=False), merge_mode="concat"))
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/models.py", line 324, in add
output_tensor = layer(self.outputs[0])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/engine/topology.py", line 491, in call
self.build(input_shapes[0])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/wrappers.py", line 218, in build
self.forward_layer.build(input_shape)
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/recurrent.py", line 733, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 753, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1122, in concat
tensor_shape.scalar())
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (4, 256, 512) and () are incompatible
When I install tensorflow-gpu==0.12.1 I got the following errors:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:03:00.0
Total memory: 10.91GiB
Free memory: 9.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0)
WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:517 in _set_model.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:521 in _set_model.: init (from tensorflow.python.training.summary_io) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename.
Epoch 1/50
E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)
It seems your keras code will eat up all the GPU memory very quickly. Thanks.
Our code is not eating all the available memory that is a problem of tensorflow, as tensorflow always allocates all available memory...
Let's have a look at your problems:
-
Tensorflow 1.12.0: it seems that the data loader does not supplt the correct data format... are you using the correct data?
-
tensorflow 0.12.1: You have a newer CuDNN library installed than expected by the library, the program tells you this:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
So you'll either have to compile the old tensorflow version by yourself or install a different version of CuDNN, or use a more modern version.
Thank you. I got it worked on my machine.
@songpu2015617
Can you please tell me how did it work in your machine?
Also, what are your cuDNN and CUDA versions?
Thanks
Hello everyone!
I am using google colab for training. I enabled GPU but the GPU is not utilized. I get message from colab:
You are not utilizing GPU runtime, please switch to standard runtime
How can I make this code utilize GPU of colab?!
@nikhil031294
- I used Ubuntu 16.04
- disabled the nouveau driver and used the shipped NVIDIA driver (384.130)
- installed cuda 8.0 via runfile (https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html) but did not update the driver
- then downloaded cuDNN 5.1 for CUDA 8.0 (https://developer.nvidia.com/rdp/cudnn-archive) and moved it to /usr/local/cuda-8.0/lib64 and the header to /usr/local/cuda-8.0/include
- set the paths:
-- $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
-- $ export PATH=/usr/local/cuda-8.0/bin:$PATH
- cloned the repo and replaced tensorflow==0.12.1 with tensorflow-gpu==0.12.1 in requirements.txt before installing
you might want to look in here:(https://chromium.googlesource.com/external/github.com/tensorflow/tensorflow/+/refs/heads/r0.12/tensorflow/g3doc/get_started/os_setup.md)