faster-rnnlm Training with several hidden layers

Hi! I have some questions about faster-rnnlm. There it is possible to use several hidden layers during training. My questions are:

Which of them is used for recurrent part?
Does it use those hidden layers during decoding or computing entropy? Thanks!

Aug 05 '15 05:08 VeliBaba

Hi!

All of them. The output of one layer is the input for the next one. For instance, if you have two tanh layers then the network looks like this: h1_t = tanh(x_t + W1 * h1_{t - 1}) h2_t = tanh(U * h1_t + W2 * h2_{t - 1})
Yes, it does.

Aug 05 '15 09:08 akhti

Ok. Is the ouput of the last hidden layer used at the input of the next neural network?

Aug 05 '15 09:08 VeliBaba

What is 'next neural network'? If you mean next timestamp (next word), then the answer is yes.

Aug 05 '15 09:08 akhti

Yes, I mean this. Ok, thanks

Aug 05 '15 09:08 VeliBaba

Is it good at performance using several hidden layers instead of a single hidden layer? Which is better: use a single hidden layer with size 400, or 4 hidden layers with size 100?

Aug 05 '15 10:08 VeliBaba

First, when you increase layer in 4 times, training/evaluation time (in theory) is increased in 16 times (4 squared). So it's more resonable to compare 1 layer of size 400 with 4 layers of size 200. However, I would recomment to train a shallow network with a single first.

Aug 05 '15 10:08 akhti

Hi! I have two different toolkits for training of the rnnlm: the first one is rnnlm-hs-0.1b (Ilya-multithreading), and the second one is faster-rnnlm.The faster-rnnlm is faster than rnnlm-hs-0.1b about 3 times with the same options. Is it expectable that valid entropy at the end of training may be worse with faster-rnnlm than rnnlm-hs-0.1b?

Aug 06 '15 05:08 VeliBaba

It's expected that the entropy will be more or less the same.

Aug 06 '15 11:08 akhti

faster-rnnlm faster-rnnlm copied to clipboard

Training with several hidden layers

faster-rnnlm
faster-rnnlm copied to clipboard