Reiji Hatsugai

Results 7 comments of Reiji Hatsugai

@AjayTalati I don't know why this error occurs, but I can solve this problem by replacing L137 to rewards.append(float(max(min(reward, 1), -1))). (add float function) I found another error in backpropagation...

@ypxie Your code is applicable to only simple optimization method which doesn't have parameters, like SGD. In the paper, they used RMSProp optimizer. It accumulates square of gradients (parameters). They...

@ypxie Yes!

torch.nn.module must take torch.Variable. But, policy(which is subclass of torch.nn.module) takes numpy.ndarray, so we have to convert numpy.ndarray to torch.Variable. I fixed this problem. See my commit 9e9fb687786a025061561c7260ba9b586e9ca4ce.

There are some differences between my code and DeepMind's paper. My code is 1. no LSTM use 2. no gradient clipping 3. no hyper parameter tuning ( I couldn't find...

Yes, so it's not fast. It only simulates results. If you want a fast implementation, you have to control bit level implementation on GPU. theano: pycuda ex. (https://github.com/MatthieuCourbariaux/BinaryNet/blob/master/Run-time/binary_ops.py) or chainer:...

Thanks! I found it here(https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L59-L62).