Kai Arulkumaran comments

Results 86 comments of


                                            Kai Arulkumaran

Implement optimality tightening

Awesome - I'll try and have a look soon or next week! Would you be able to test it to try and replicate one of the results from the paper?...

Implement optimality tightening

In my experience the smallest details in a paper can be key to reproducing results - and these may be missing or ambiguous. If anyone is reasonably confident in their...

Why is the current sharedRmsprop thread safe?

Looks fine to me but I'll leave it to @lake4790k's discretion.

Finish prioritised experience replay

**Note:** It might be worth subclassing the Heap from [torchlib](https://github.com/vzhong/torchlib) for the priority queue.

Allow overwritting action by environment

Code looks fine, but what is this trying to achieve? If the chosen action may not be deterministically executed in the environment, the agent should still treat it as if...

Allow overwritting action by environment

Got it - can you add a short 2nd paragraph to the [custom docs](https://github.com/Kaixhin/Atari#custom) to make people aware of this modification from the `rlenvs` API, along with a use-case as...

Recurrent Dqn

Yep a switch for using a DRQN architecture would be great. For now I'd go for using `histLen` as the number of frames to use BPTT on for a single-frame...

@lake4790k Almost have something working. Disabling [this line](https://github.com/Kaixhin/Atari/blob/rnn/Model.lua#L157) lets the DRQN train, as otherwise it crashes [here](https://github.com/Kaixhin/Atari/blob/rnn/Agent.lua#L462), somehow propagating a batch of size 20 forward but expecting the normal batch...

Recurrent Dqn

@lake4790k I'd have to delve into the original paper/code, but it looks like they train the network every step (as opposed to every 4). This seems like it'll be a...

Recurrent Dqn

Here's the result of running `./run.sh demo -recurrent true`, so I'm reasonably confident that the DRQN is capable of learning, but I'm not testing this further for now so I'm...