Anton Bakhtin comments

Results 21 comments of


                                            Anton Bakhtin

what is the meaning of "-nan & Nnet rejected"

Nan means that the networked have blown up. It's quite a problem for ReLU activation. "NNet rejected" means that the program recognized the problem and started to train from previous...

Error building faster-rnnlm

Try to add -D_FORCE_INLINES in NVCC_FLAGS in Makefile: https://github.com/yandex/faster-rnnlm/blob/master/faster-rnnlm/Makefile#L7 ``` NVCC_CFLAGS = -D_FORCE_INLINES -O3 -march=native -funroll-loops ```

Addition of a build system generator

Currently only CUDA test is required. Does it fail for you?

Addition of a build system generator

I'm trying to follow KISS principle. At the moment the only non-trivial (and optional) dependency is cuda. A higher level build system would add one more. Of course, the makefile...

Training with several hidden layers

Hi! 1. All of them. The output of one layer is the input for the next one. For instance, if you have two tanh layers then the network looks like...

Training with several hidden layers

What is 'next neural network'? If you mean next timestamp (next word), then the answer is yes.

Training with several hidden layers

First, when you increase layer in 4 times, training/evaluation time (in theory) is increased in 16 times (4 squared). So it's more resonable to compare 1 layer of size 400...

Training with several hidden layers

It's expected that the entropy will be more or less the same.

big-endian machine

Well, seems that that you need some more portable format to do this kind of stuff, e.g. text. That's not so hard to force faster-rnnlm to use text format. -...

Benchmark on 1 Billion

I will definitely add some ASAP! And could you advice any other implementations that could handle one billion word dataset? I'm awawre about torch-based HS. Everything else seems to be...