Reiji Hatsugai comments

Results 7 comments of


                                            Reiji Hatsugai

How to modify code for continuous actions?

@AjayTalati I don't know why this error occurs, but I can solve this problem by replacing L137 to rewards.append(float(max(min(reward, 1), -1))). (add float function) I found another error in backpropagation...

Asy_optimizer

@ypxie Your code is applicable to only simple optimization method which doesn't have parameters, like SGD. In the paper, they used RMSProp optimizer. It accumulates square of gradients (parameters). They...

Asy_optimizer

@ypxie Yes!

expected a Variable arg but got numpy.ndarray error

torch.nn.module must take torch.Variable. But, policy(which is subclass of torch.nn.module) takes numpy.ndarray, so we have to convert numpy.ndarray to torch.Variable. I fixed this problem. See my commit 9e9fb687786a025061561c7260ba9b586e9ca4ce.

expected a Variable arg but got numpy.ndarray error

There are some differences between my code and DeepMind's paper. My code is 1. no LSTM use 2. no gradient clipping 3. no hyper parameter tuning ( I couldn't find...

in XNOR or BWN, convolution should avoid any multiplication

Yes, so it's not fast. It only simulates results. If you want a fast implementation, you have to control bit level implementation on GPU. theano: pycuda ex. (https://github.com/MatthieuCourbariaux/BinaryNet/blob/master/Run-time/binary_ops.py) or chainer:...

Normalize advantage function

Thanks! I found it here(https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L59-L62).