char-rnn
char-rnn copied to clipboard
how to understand the parameters of LSTM.lstm constructor
I am struggling to understand the meaning of the parameters in
function LSTM.lstm(input_size, rnn_size, n, dropout)
from model/LSTM.lua
looking at the following line from train.lua, where the lstm is built:
protos.rnn = LSTM.lstm(vocab_size, opt.rnn_size, opt.num_layers, opt.dropout)
I am trying to understand them wrt to the image http://karpathy.github.io/assets/rnn/charseq.jpeg
why do we have input_size=vocab_size. this is one character, right ?
if rnn_size is the length of the sequence, which is the number of blocks in horizontal direction shouldn't the input be also a sequence of chars of the same length, which would mean input_size= vocab_size*rnn_size ?
In fact, if you check the code carefully, you should know clear data pre-processing. Original data is in size [1, length(data)], then it comes to size[batch_size, num_batches* seq_length], then it is split into several batches x_batches with size[batch_size, seq_length]. In training process, the first iteration https://github.com/karpathy/char-rnn/blob/master/train.lua#L231 is to iterate all batches num_batches, the second iteration is https://github.com/karpathy/char-rnn/blob/master/train.lua#L236 to segment the x_batch into time_indexed, whose index is variable in seq_length, thus x_batch needs to transpose. Finally x_batch[t] equals to the input at time t, with size batch_size.
Based on discussions above, the input size x[t] in training is batch_size.
Similar analysis leads to input [2*L] =batch_size * rnn_size in LSTM.lua