Deep-Learning
Deep-Learning copied to clipboard
Shapes do not match
Hi, me again sorry!
I get this error when I run your code: Traceback (most recent call last):
File "cbow_model.py", line 49, in <module>
word_context_product = merge([word_embedding, cbow], mode='dot')
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1680, in merge
name=name)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1301, in __init__
self.add_inbound_node(layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 172, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1409, in call
output = K.batch_dot(l1, l2, self.dot_axes)
File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 908, in batch_dot
out = tf.matmul(x, y, adjoint_a=adj_x, adjoint_b=adj_y)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1855, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2397, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [?,1,100], [?,100].
Embeddings layers indeed return 3D tensors of shape (samples, sequence_length, embedding_dim)...
I had used Theano backend during testing. I haven't tested this on tensorflow. You can change the keras backend as specified here
From looking at your error I think you need to either flatten or reshape the word_embedding
or cbow
tensor.
It is actually crazy that the behavior is different between the two backends! I can't figure out how to make it work with Tensorflow! I can Flatten()
word_embedding
because it is of shape 1.
However, Flattening won't work for the negative_words_embedding
since it is of size n..
I think you can reshape with keras.
I managed to make it to work but it is painfully slow, even on a decent GPU.. I think it is because too many parameters are being updated in the embedding layer at each iteration! I am investigating on how partial updates can be done!
I'm working on the CNTK version. It uses sparse gradients for Embeddings. I hope that it will work faster!
Actually, with the Tensorflow backend, you can use the TF optimizers directly, which are sparse already! It's almost a 10 times speedup on my laptop! Here is the TF compatible version of your code in case you are interested:
#!/usr/bin/env python
import logging
import numpy as np
from keras import backend as K
from keras.layers import Flatten, Input, Lambda, Reshape, merge
from keras.layers.embeddings import Embedding
from keras.models import Model
from save_embeddings import save_embeddings
from sentences_iterator import SentencesIterator
from vocab_generator import VocabGenerator
EMBEDDING_DIM = 100
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
level=logging.INFO)
sentences = SentencesIterator('/Users/thms/clustree/data/test_vectorization/gcs/part-00000')
v_gen = VocabGenerator(sentences=sentences, min_count=20, window_size=3,
sample_threshold=-1, negative=5)
v_gen.scan_vocab()
v_gen.filter_vocabulary()
reverse_vocab = v_gen.generate_inverse_vocabulary_lookup('test_lookup')
# Generate embedding matrix with all values between -1/2d, 1/2d
embedding = np.random.uniform(-1.0 / (2 * EMBEDDING_DIM),
1.0 / (2 * EMBEDDING_DIM),
(v_gen.vocab_size + 3, EMBEDDING_DIM))
# Creating CBOW model
# Model has 3 inputs
# Current word index, context words indexes and negative sampled word indexes
word_index = Input(shape=(1,))
context = Input(shape=(2*v_gen.window_size,))
negative_samples = Input(shape=(v_gen.negative,))
# All inputs are processed through a common embedding layer
shared_embedding_layer = Embedding(input_dim=(v_gen.vocab_size + 3),
output_dim=EMBEDDING_DIM,
weights=[embedding])
word_embedding = shared_embedding_layer(word_index)
# Shape output = (None,1,emb_size)
context_embeddings = shared_embedding_layer(context)
# Shape output = (None, 2*window_size, emb_size)
negative_words_embedding = shared_embedding_layer(negative_samples)
# Shape output = (None, negative, emb_size)
# Context words are averaged to get the CBOW vector
cbow = Lambda(lambda x: K.mean(x, axis=1),
output_shape=(EMBEDDING_DIM,))(context_embeddings)
# Shape output = (None, emb_size)
cbow = Reshape((1, EMBEDDING_DIM))(cbow)
# Shape output = (None, 1, emb_size)
# Context is multiplied (dot product) with current word and negative
# sampled words
word_context_product = merge([word_embedding, cbow], mode='dot')
# Shape output = (None, 1, 1)
word_context_product = Flatten()(word_context_product)
# Shape output = (None,1)
negative_context_product = merge([negative_words_embedding, cbow],
mode='dot',
dot_axes=[2, 2])
# Shape output = (None, negative, 1)
negative_context_product = Flatten()(negative_context_product)
# Shape output = (None, negative)
# The dot products are outputted
model = Model(input=[word_index, context, negative_samples],
output=[word_context_product, negative_context_product])
# Binary crossentropy is applied on the output
model.compile(optimizer=K.tf.train.RMSPropOptimizer(0.02),
loss='binary_crossentropy')
print(model.summary())
model.fit_generator(v_gen.pretraining_batch_generator(reverse_vocab),
samples_per_epoch=v_gen.corpus_count,
nb_epoch=5)
# Save the trained embedding
save_embeddings("embedding.txt",
shared_embedding_layer.get_weights()[0],
v_gen.vocabulary)
Note that I am calling a native TF optimizer directly through the backend module: K.tf.train.RMSPropOptimizer(0.02)
I'll try this on a Amazon P2 instance with a big GPU to see how fast it can get when I have some time, but I feel like some other optimizations can be made, maybe by feeding batches in input instead of sending the data one array at a time. I guess the text preprocessing (converting words to indices) at each iteration is also quite costly because there is a lot of overlapping! I'll try to preprocess somewhere else before to see if it improves the performances!