char-rnn icon indicating copy to clipboard operation
char-rnn copied to clipboard

how to understand the parameters of LSTM.lstm constructor

Open YitzhakSp opened this issue 8 years ago • 1 comments

I am struggling to understand the meaning of the parameters in

function LSTM.lstm(input_size, rnn_size, n, dropout) from model/LSTM.lua

looking at the following line from train.lua, where the lstm is built: protos.rnn = LSTM.lstm(vocab_size, opt.rnn_size, opt.num_layers, opt.dropout)

I am trying to understand them wrt to the image http://karpathy.github.io/assets/rnn/charseq.jpeg

why do we have input_size=vocab_size. this is one character, right ?

if rnn_size is the length of the sequence, which is the number of blocks in horizontal direction shouldn't the input be also a sequence of chars of the same length, which would mean input_size= vocab_size*rnn_size ?

YitzhakSp avatar Jul 21 '16 14:07 YitzhakSp

In fact, if you check the code carefully, you should know clear data pre-processing. Original data is in size [1, length(data)], then it comes to size[batch_size, num_batches* seq_length], then it is split into several batches x_batches with size[batch_size, seq_length]. In training process, the first iteration https://github.com/karpathy/char-rnn/blob/master/train.lua#L231 is to iterate all batches num_batches, the second iteration is https://github.com/karpathy/char-rnn/blob/master/train.lua#L236 to segment the x_batch into time_indexed, whose index is variable in seq_length, thus x_batch needs to transpose. Finally x_batch[t] equals to the input at time t, with size batch_size. Based on discussions above, the input size x[t] in training is batch_size.
Similar analysis leads to input [2*L] =batch_size * rnn_size in LSTM.lua

SJTUsuperxu avatar Nov 03 '16 02:11 SJTUsuperxu