keras-extras
keras-extras copied to clipboard
Incompatible shapes
I am running make_parallel with 2 GPUs, the error occurred with gradients/sub_grad/BroadcastGradientArgs: "InvalidArgumentError (see above for traceback): Incompatible shapes: [483,1] vs. [482,1] [[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape, gradients/sub_grad/Shape_1/_79)]]"
I get the exact same error. Would appreciate some help on this.
i get similar error. i guess it is due to the last minibatch has an odd number of samples, however the paralleled model only produced even number of predictions
Did you hardcode the batch size in your first layer input (batch_input_shape), or give input_dim ?
@Caduceus96 just gave input_dim. Batch size is hardcoded when I call fit
The same error here! Running Keras 2.0.2 with Tensorflow 0.12.1
InvalidArgumentError (see above for traceback): Incompatible shapes: [6376,256] vs. [6379,256]
[[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape/_459, gradients/sub_grad/Shape_1)]]
[[Node: gradients/concatenate_1/concat_grad/Slice_7/_491 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:7", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_3913_gradients/concatenate_1/concat_grad/Slice_7", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:7"]()]]
It might be related to the function get_slice
. I found out that if the number of input data is a multiple of your batch size, then there is no such error
OK, I'm probably wrong. The error seems to come from my callback function. If I don't do callbacks, everything is fine no matter how many rows of input data.
I actually see this error when I try to run the example in the website.
Same as @Eric2333 , don't use callbacks or change them to lambda functions and it works fine.
Also ran in to this error with Keras 2.0.3 and TensorFlow 1.1.0 It happens at the end of the first epoch of training. Possibly in calculating validation. (I do use callbacks for checkpoint and early stopping).. will try without.
73997312/73997516 [============================>.] - ETA: 0s - loss: 12.1832/home/ubuntu/devhome/tensorwords2/multi_gpu.py:45: UserWarning: The merge
function is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge
, e.g. add
, concatenate
, etc.
merged.append(merge(outputs, mode='concat', concat_axis=0))
/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/legacy/layers.py:460: UserWarning: The Merge
layer is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge
, e.g. add
, concatenate
, etc.
name=name)
/home/ubuntu/devhome/tensorwords2/multi_gpu.py:47: UserWarning: Update your Model
call to the Keras 2 API: Model(inputs=[<tf.Tenso..., outputs=[<tf.Tenso...)
return Model(input=model.inputs, output=merged)
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/home/ubuntu/.pyenv/versions/3.6.1/lib/python3.6/contextlib.py", line 89, in exit
next(self.gen)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [204,34] vs. [200,34]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]]
[[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./TextGenLearn3.py", line 293, in
Caused by op 'mul', defined at:
File "./TextGenLearn3.py", line 293, in
InvalidArgumentError (see above for traceback): Incompatible shapes: [204,34] vs. [200,34] [[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]] [[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]
The number of samples just needs to be a mutiple of the total number of GPUs. Ex. I had 68531 samples in in my input, and once I shaved that down to 68528 with 8 GPUs, it worked fine.
@jwilt1 Thanks!! Your example is nice work. I modified my code, the input sample size must be n_gpu times.
If you have large training set it's not an issue and you can always cut it like:
train_cut = len(train_index)%GPUs train_index = train_index[:-train_cut]
And it works fine. But after training I have issue with predictions, it have to be multiple by GPUs as well. Any ideas?
You can use the same kind of trick as for training, but instead of removing the last remainder elements you pad the end of your dataset to make it divisible by # of gpus, then select the unpadded indices as your actual prediction.
Sent from my iPhone
On Jun 25, 2017, at 3:15 AM, Sergey Zhitansky [email protected] wrote:
If you have large training set it's not an issue and you can always cut it like:
train_cut = len(train_index)%GPUs train_index = train_index[:-train_cut]
And it works fine. But after training I have issue with predictions, it have to be multiple by GPUs as well. Any ideas?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@Caduceus96 I sliced my training data into multiples of gpus, the first epoch runs well, but when it comes to the second epoch, error raises
3792/3800 [============================>.] - ETA: 0s - loss: 11.5726 - mean_squared_error: 1.9049Traceback (most recent call last):
......
InvalidArgumentError (see above for traceback): Incompatible shapes: [12,3] vs. [14,3] [[Node: sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](concatenate_2/concat/_851, _recv_concatenate_2_target_0/_853)]] [[Node: add_3/_857 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_3571_add_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
train_shape=[(3800, none, 1)] * 10, valid_shape=[(254, none, 1)] * 10, corresponding to train_shape, num_gpu = 4, train_batch=16
Is your training set size evenly divisible by gpu #?
Sent from my iPhone
On Jul 17, 2017, at 9:33 AM, Ling-han Jiang [email protected] wrote:
I sliced my training data into multiples of gpus, the first epoch runs well, but when it comes to the second epoch, error raises InvalidArgumentError (see above for traceback): Incompatible shapes: [12,3] vs. [14,3] [[Node: sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](concatenate_2/concat/_851, _recv_concatenate_2_target_0/_853)]] [[Node: add_3/_857 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_3571_add_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@Caduceus96 I guess so, 3800/4 =950.
@JiangLing-han it is evident you are using small batch sizes during your training (as the progress bar output from your Keras model.train
routine stops at 3792/3800
.
You need to make sure your batches are of equal size and divisible by 4
.
@ktamiola @Caduceus96 I solved this problem by set size of validation set to multiples of 4. The model was copied, valid data was sliced as well as train data. Many thanks for you. :)
If you want to predict just one at a time, instead of a multiple of the GPUs used during training, you can create a 2nd model that is identical and load the weights of your parallelized model.
- Create a model named model1
- Create model2 by applying the make_parallel fuction to model1
- Train model2 with 8 GPUs
- Set model1 weights to weights of model2. model.set_weights(model2.get_weights())
- Predict however many you want at a time using model1
model1.predict(val[0:10,:,:]) -> success model2.predict(val[0:10,:,:]) -> ValueError: could not broadcast input array from shape (8,2) into shape (10,2)
Many thanks to your code!
I would suggest adding a note at the beginning of the make_parallel
function to notify that the size of training/validation data should be divisible by the number of gpus. It would be opaque for a user to see why training is okay but after an epoch an exception of imcompatible shapes is thrown.
Has anyone else faced an error using regularizers? Using Layers like this:
def` conv2d_bn(x, nb_filter, nb_row, nb_col, padding='same', strides=(1, 1), bias=False):
"""
Utility function to apply conv + BN.
(Slightly modified from https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py)
"""
if K.image_data_format() == "channels_first":
channel_axis = 1
else:
channel_axis = -1
x = Convolution2D(nb_filter, (nb_row, nb_col),
strides=strides,
padding=padding,
use_bias=bias,
kernel_regularizer=regularizers.l2(0.00004), ##<---- causes error because no _loss
kernel_initializer=initializers.VarianceScaling(scale=2.0, mode='fan_in', distribution='normal',
seed=None))(x)
x = BatchNormalization(axis=channel_axis, momentum=0.9997, scale=False)(x)
x = Activation('relu')(x)
return x
I get the error:
„AttributeError: 'Model' object has no attribute '_losses'„
caused by outputs = model (inputs)
that merges the outputs of the different splits in one model.
batch size : 64
number of batches : 20
number of GPUs: 2
The error I got:
InvalidArgumentError: Incompatible shapes: [64,2] vs. [128,2]
How can I deal with this?
@DNXie, I am having the same error, the shape[0] gets halfed. Did you find a solution?
A related issue: https://github.com/keras-team/keras/issues/9449
Same issue here with the latest Keras version.
Hi, was a fix issued for this error? I am facing the same issue. model.fit works for batch size 64 when not using multi GPU. But when I put the same model through multi_gpu_model and call fit on it, it is raising error that 16 and 64 are incompatible shapes.
I am getting the error tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7600] vs. [400,19] some of the pointers are as follows:
- I get this error only when run my code on a GPU node (Tesla k80)
- I do not get the error for batch_size = 1
- I do not get the error when I do not use metrics=['accuracy'] in compile.
- I get the error only for some particular architecture
- All the problems reported above have problems with arrays of the same dimensionality [n1,n2] vs [m1,m2] but my case is [n] vs [n/r, r]
full error is as follows:
MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Epoch 1/10
Traceback (most recent call last):
File "driver_training.py", line 66, in
here is full code
import numpy as np from keras.models import Model from keras import optimizers from keras.layers import Input, Dense, Embedding import keras
num_decoder_tokens=40 len_label_vector=20 latent_dim=300
train_labels_vecs = np.random.randint(num_decoder_tokens, size=(100, len_label_vector))
decoder_input_data = train_labels_vecs[:, :-1] decoder_target_data = train_labels_vecs[:, 1:]
decoder_inputs = Input(shape=(None,), name='Decoder-Input') # for teacher forcing x = Embedding(num_decoder_tokens, latent_dim, name='Decoder-Word-Embedding', mask_zero=False)(decoder_inputs) decoder_outputs = Dense(num_decoder_tokens, activation='softmax', name='Final-Output-Dense') (x)
seq2seq_Model = Model([decoder_inputs], decoder_outputs)
print(seq2seq_Model.summary())
seq2seq_Model.compile(optimizer=optimizers.Nadam(lr=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = seq2seq_Model.fit([decoder_input_data], np.expand_dims(decoder_target_data, -1),validation_split=0.12,epochs=10,batch_size=2)
@jayanti-prasad
same error and the followings are completely true when i run a seq2seq architecture on a local pc.
- I do not get the error for batch_size = 1
- I do not get the error when I do not use metrics=['accuracy'] in compile.
BUT, there is no error when i run the codes on a kaggle kernel with the same tf version1.12.0 and the keras version2.2.4.
I also have a very similar error and changing the batch size and sample size to fit the multiple of GPU doesn't solve the problem. My error is as follows:
InvalidArgumentError: Incompatible shapes: [128,32,32,3] vs. [256,32,32,3]
[[{{node replica_1/sequential_1/conv_lst_m2d_1/while/mul_3}} = Mul[T=DT_FLOAT, _class=["loc:@train...rayWriteV3"], _device="/job:localhost/replica:0/task:0/device:GPU:1"](replica_1/sequential_1/conv_lst_m2d_1/while/TensorArrayReadV3, replica_1/sequential_1/conv_lst_m2d_1/while/mul_3/Enter)]]
[[{{node loss/mul/_305}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5049_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
This problem only happens when the model has a ConvLSTM2D layer, without it the code runs just fine. As for other properties:
- I am using 2 GPUs
- Sample size 2048
- batch size 256
- Each of my input sample has shape [21, 32, 32, 1] where 21 is the temporal size, 32 x 32 image, 1 channel