keras-attention Attention Decoder for OutputDimension in tens of thousands.

Attention Decoder for OutputDimension in tens of thousands.

Open KushalDave opened this issue 6 years ago • 2 comments

Hi Zafarali,

I am trying to use your attention network to learn seq2seq machine translation with attention. My spurce lang output vocab is of size 32,000 and target vocab size 34,000. The following step blows up the RAM usage while making the model (understandably, as its trying to manage a 34K x 34K float matrix):

	self.W_o = self.add_weight(shape=(self.output_dim, self.output_dim),
							   name='W_o',
							   initializer=self.recurrent_initializer,
							   regularizer=self.recurrent_regularizer,
							   constraint=self.recurrent_constraint)

Here is my model: n_units:128, src_vocab_size:32000,tar_vocab_size:34000,src_max_length:11, tar_max_length:11

	def define_model(n_units, src_vocab_size, tar_vocab_size, src_max_length, tar_max_length):
		model = Sequential()
		model.add(Embedding(src_vocab_size, n_units, input_length=src_max_length, mask_zero=True))
		model.add(LSTM(n_units, return_sequences=True))
		model.add(AttentionDecoder(n_units, tar_vocab_size))
		return model

Is there any fix for this?

Jan 02 '19 21:01 KushalDave

I have tried several things but cant get it working. Adding this weight seems to bloat up the memory over 2G and the code crashes.

Jan 05 '19 10:01 KushalDave

You could try to change the type of the weights to tf.float16 or something with lower precision to save memory.

Jan 12 '19 13:01 zafarali

keras-attention keras-attention copied to clipboard

Attention Decoder for OutputDimension in tens of thousands.

keras-attention
keras-attention copied to clipboard