Yasuhiro Fujita
Yasuhiro Fujita
Wow, good catch! The sign seems not correct. Thank you for reporting it. If I understand correctly, `tau` in the paper actually corresponds to `1-tau` in ChainerRL's IQN, because `|tau...
@uezo kindly implemented TicTacToe! http://qiita.com/uezo/items/87b25c93199d72a56a9a#%E5%8F%82%E8%80%83%E3%82%B5%E3%82%A4%E3%83%88
Thank you for the improvements on PCL. I haven't checked the implementation details yet, but I think solving the memory issue is great as long as it won't make training...
Good catch. The problem comes from the fact that resuming agent training via `step_offset` is not well tested.
@ElliotWay Interesting. Which game did you try? When I tuned `train_acer_ale.py`, I found it is much more sample-efficient than A3C on Breakout with the default parameters.
@ElliotWay Thank you. It is possible there has been some regression in ChainerRL. It should be investigated.
~Links cannot be deepcopied after `to_device('native')`. We need to find a workaround or wait until it's fixed. https://github.com/chainer/chainer/issues/5916~ solved
Async training requires https://github.com/chainer/chainer/issues/5931 to be fixed.
``` def deepcopy_link(link): device = link.device link.to_device(np) new_link = copy.deepcopy(link) link.to_device(device) new_link.to_device(device) return new_link ``` This can be a workaround to deepcopy.
Current ChainerX does not support advanced indexing, which prevents from applying it to CategoricalDQN and IQN. https://github.com/chainer/chainer/issues/5944