Deep-Learning icon indicating copy to clipboard operation
Deep-Learning copied to clipboard

Shapes do not match

Open Threynaud opened this issue 8 years ago • 7 comments

Hi, me again sorry!

I get this error when I run your code: Traceback (most recent call last):

File "cbow_model.py", line 49, in <module>
  word_context_product = merge([word_embedding, cbow], mode='dot')
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1680, in merge
  name=name)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1301, in __init__
  self.add_inbound_node(layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 635, in add_inbound_node
  Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 172, in create_node
  output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1409, in call
  output = K.batch_dot(l1, l2, self.dot_axes)
File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 908, in batch_dot
  out = tf.matmul(x, y, adjoint_a=adj_x, adjoint_b=adj_y)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1855, in matmul
  a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul
  transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
  op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2397, in create_op
  set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs
  shapes = shape_func(op)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
  return call_cpp_shape_fn(op, require_shape_fn=True)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
  debug_python_shape_fn, require_shape_fn)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
  raise ValueError(err.message)
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [?,1,100], [?,100].

Embeddings layers indeed return 3D tensors of shape (samples, sequence_length, embedding_dim)...

Threynaud avatar Feb 27 '17 22:02 Threynaud

I had used Theano backend during testing. I haven't tested this on tensorflow. You can change the keras backend as specified here

abaheti95 avatar Feb 28 '17 04:02 abaheti95

From looking at your error I think you need to either flatten or reshape the word_embedding or cbow tensor.

abaheti95 avatar Feb 28 '17 04:02 abaheti95

It is actually crazy that the behavior is different between the two backends! I can't figure out how to make it work with Tensorflow! I can Flatten() word_embedding because it is of shape 1. However, Flattening won't work for the negative_words_embedding since it is of size n..

Threynaud avatar Mar 01 '17 09:03 Threynaud

I think you can reshape with keras.

abaheti95 avatar Mar 01 '17 10:03 abaheti95

I managed to make it to work but it is painfully slow, even on a decent GPU.. I think it is because too many parameters are being updated in the embedding layer at each iteration! I am investigating on how partial updates can be done!

Threynaud avatar Mar 01 '17 17:03 Threynaud

I'm working on the CNTK version. It uses sparse gradients for Embeddings. I hope that it will work faster!

abaheti95 avatar Mar 06 '17 11:03 abaheti95

Actually, with the Tensorflow backend, you can use the TF optimizers directly, which are sparse already! It's almost a 10 times speedup on my laptop! Here is the TF compatible version of your code in case you are interested:

#!/usr/bin/env python

import logging

import numpy as np
from keras import backend as K
from keras.layers import Flatten, Input, Lambda, Reshape, merge
from keras.layers.embeddings import Embedding
from keras.models import Model

from save_embeddings import save_embeddings
from sentences_iterator import SentencesIterator
from vocab_generator import VocabGenerator

EMBEDDING_DIM = 100

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
                    level=logging.INFO)

sentences = SentencesIterator('/Users/thms/clustree/data/test_vectorization/gcs/part-00000')

v_gen = VocabGenerator(sentences=sentences, min_count=20, window_size=3,
                       sample_threshold=-1, negative=5)

v_gen.scan_vocab()
v_gen.filter_vocabulary()
reverse_vocab = v_gen.generate_inverse_vocabulary_lookup('test_lookup')

# Generate embedding matrix with all values between -1/2d, 1/2d
embedding = np.random.uniform(-1.0 / (2 * EMBEDDING_DIM),
                              1.0 / (2 * EMBEDDING_DIM),
                              (v_gen.vocab_size + 3, EMBEDDING_DIM))

# Creating CBOW model
# Model has 3 inputs
# Current word index, context words indexes and negative sampled word indexes
word_index = Input(shape=(1,))
context = Input(shape=(2*v_gen.window_size,))
negative_samples = Input(shape=(v_gen.negative,))

# All inputs are processed through a common embedding layer
shared_embedding_layer = Embedding(input_dim=(v_gen.vocab_size + 3),
                                   output_dim=EMBEDDING_DIM,
                                   weights=[embedding])

word_embedding = shared_embedding_layer(word_index)
# Shape output = (None,1,emb_size)
context_embeddings = shared_embedding_layer(context)
# Shape output = (None, 2*window_size, emb_size)
negative_words_embedding = shared_embedding_layer(negative_samples)
# Shape output = (None, negative, emb_size)

# Context words are averaged to get the CBOW vector
cbow = Lambda(lambda x: K.mean(x, axis=1),
              output_shape=(EMBEDDING_DIM,))(context_embeddings)
# Shape output = (None, emb_size)
cbow = Reshape((1, EMBEDDING_DIM))(cbow)
# Shape output = (None, 1, emb_size)

# Context is multiplied (dot product) with current word and negative
# sampled words
word_context_product = merge([word_embedding, cbow], mode='dot')
# Shape output = (None, 1, 1)
word_context_product = Flatten()(word_context_product)
# Shape output = (None,1)
negative_context_product = merge([negative_words_embedding, cbow],
                                 mode='dot',
                                 dot_axes=[2, 2])
# Shape output = (None, negative, 1)
negative_context_product = Flatten()(negative_context_product)
# Shape output = (None, negative)

# The dot products are outputted
model = Model(input=[word_index, context, negative_samples],
              output=[word_context_product, negative_context_product])

# Binary crossentropy is applied on the output
model.compile(optimizer=K.tf.train.RMSPropOptimizer(0.02),
              loss='binary_crossentropy')
print(model.summary())

model.fit_generator(v_gen.pretraining_batch_generator(reverse_vocab),
                    samples_per_epoch=v_gen.corpus_count,
                    nb_epoch=5)

# Save the trained embedding
save_embeddings("embedding.txt",
                shared_embedding_layer.get_weights()[0],
                v_gen.vocabulary)

Note that I am calling a native TF optimizer directly through the backend module: K.tf.train.RMSPropOptimizer(0.02)

I'll try this on a Amazon P2 instance with a big GPU to see how fast it can get when I have some time, but I feel like some other optimizations can be made, maybe by feeding batches in input instead of sending the data one array at a time. I guess the text preprocessing (converting words to indices) at each iteration is also quite costly because there is a lot of overlapping! I'll try to preprocess somewhere else before to see if it improves the performances!

Threynaud avatar Mar 06 '17 12:03 Threynaud