Anton Bakhtin
Anton Bakhtin
Nan means that the networked have blown up. It's quite a problem for ReLU activation. "NNet rejected" means that the program recognized the problem and started to train from previous...
Try to add -D_FORCE_INLINES in NVCC_FLAGS in Makefile: https://github.com/yandex/faster-rnnlm/blob/master/faster-rnnlm/Makefile#L7 ``` NVCC_CFLAGS = -D_FORCE_INLINES -O3 -march=native -funroll-loops ```
Currently only CUDA test is required. Does it fail for you?
I'm trying to follow KISS principle. At the moment the only non-trivial (and optional) dependency is cuda. A higher level build system would add one more. Of course, the makefile...
Hi! 1. All of them. The output of one layer is the input for the next one. For instance, if you have two tanh layers then the network looks like...
What is 'next neural network'? If you mean next timestamp (next word), then the answer is yes.
First, when you increase layer in 4 times, training/evaluation time (in theory) is increased in 16 times (4 squared). So it's more resonable to compare 1 layer of size 400...
It's expected that the entropy will be more or less the same.
Well, seems that that you need some more portable format to do this kind of stuff, e.g. text. That's not so hard to force faster-rnnlm to use text format. -...
I will definitely add some ASAP! And could you advice any other implementations that could handle one billion word dataset? I'm awawre about torch-based HS. Everything else seems to be...