Kai Arulkumaran
Kai Arulkumaran
Basically these are parameters that aren't updated via gradient descent (but would be serialized - a good example that already exists here is the running mean or running variance in...
[Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening](https://arxiv.org/abs/1611.01606) potentially speeds up Q-learning by an order of magnitude! Apparently not too hard to implement either.
Make function less monolithic by factoring out update rules e.g. persistent advantage learning.
[Learning functions across many orders of magnitudes](http://arxiv.org/abs/1602.07714) introduces Preserving Outputs Precisely, while Adaptively Rescaling Targets (Pop-Art). In summary it normalises outputs across orders of magnitudes and gets rid of the...
Rank-based prioritised experience replay appears to be working, but technically needs some changes. Instead of storing terminal states with a priority of 0, they should not be stored at all....
[Control of Memory, Active Perception, and Action in Minecraft](https://arxiv.org/abs/1605.09128) introduces a memory Q-network (MQN) and recurrent MQN (RMQN), based on a _relatively_ simple key-value soft attention memory. These could feasibly...
[Safe and efficient off-policy reinforcement learning](https://arxiv.org/abs/1606.02647) implements this new algorithm with experience replay, but actually uses asynchrononous agents with experience replay for testing (the combination was going to happen soon...
The test on Beam Rider is failing badly, and does not look promising.
All images should move from Ubuntu 14.04 LTS to 16.04 LTS, except for CUDA images where versions = 8.0 should migrate (see https://github.com/NVIDIA/nvidia-docker/issues/110). - [ ] **brainstorm** - [ ]...
Images in bold have builds disabled via removing their dependent linked repository. - [ ] **cuda-torch** - [ ] **keras** - [ ] **cuda-keras** - [ ] **pylearn2** - [...