faster-rnnlm
faster-rnnlm copied to clipboard
Training with several hidden layers
Hi! I have some questions about faster-rnnlm. There it is possible to use several hidden layers during training. My questions are:
- Which of them is used for recurrent part?
- Does it use those hidden layers during decoding or computing entropy? Thanks!
Hi!
- All of them. The output of one layer is the input for the next one. For instance, if you have two tanh layers then the network looks like this: h1_t = tanh(x_t + W1 * h1_{t - 1}) h2_t = tanh(U * h1_t + W2 * h2_{t - 1})
- Yes, it does.
Ok. Is the ouput of the last hidden layer used at the input of the next neural network?
What is 'next neural network'? If you mean next timestamp (next word), then the answer is yes.
Yes, I mean this. Ok, thanks
Is it good at performance using several hidden layers instead of a single hidden layer? Which is better: use a single hidden layer with size 400, or 4 hidden layers with size 100?
First, when you increase layer in 4 times, training/evaluation time (in theory) is increased in 16 times (4 squared). So it's more resonable to compare 1 layer of size 400 with 4 layers of size 200. However, I would recomment to train a shallow network with a single first.
Hi! I have two different toolkits for training of the rnnlm: the first one is rnnlm-hs-0.1b (Ilya-multithreading), and the second one is faster-rnnlm.The faster-rnnlm is faster than rnnlm-hs-0.1b about 3 times with the same options. Is it expectable that valid entropy at the end of training may be worse with faster-rnnlm than rnnlm-hs-0.1b?
It's expected that the entropy will be more or less the same.