btgym
btgym copied to clipboard
General discussion on State-of-the-Art Research
A lot of research in the field of RL is being done now days. I thought it can be both interesting and productive to have a post that would bring new research from time to time that might be relevant to this project.
CURIOSITY-DRIVEN LEARNING – EXPLORATION BY RANDOM NETWORK DISTILLATION
OpenAI have recently published a paper describing a new architecture extension for dealing with 'hard exploration' problem in Atari games. By highly rewarding the policy on exploration of 'state with interest' that would be normally ignored due to there complexity
This paper introduces an exploration bonus that is particularly simple to implement, works well with high-dimensional observations, can be used with any policy optimization algorithm, and is efficient to compute as it requires only a single forward pass of a neural network on a batch of experience. Our exploration bonus is based on the observation that neural networks tend to have significantly lower prediction errors on examples similar to those on which they have been trained. This motivates the use of prediction errors of networks trained on the agent’s past experience to quantify the novelty of new experience.
for better overview of the papar this blog offers also a nice diagram of the network.
On the same topic Uber had announced on their blog that they have achieved a significantly better results on the 'hard exploration' problem. but for now no paper had been published
@Kismuz , it looks like Uber's new Go-Explore algorithm had some break-through https://eng.uber.com/go-explore/
Population Based Training (PBT) of Neural Networks
DeepMind had published a paper last year for 'lazy' hyperparameter tuning by self discovery of optimal hyperparameter set. Each worker is working with a small permutation of hyperparameters and during training the framework evaluate best performing worker/s and change the other workers accordingly to keep exploring the optimal set. (algo was tested on A3C)
PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters
An implementation of PBT can be found in Ray - Tune library
Unsupervised Predictive Memory in a Goal-Directed Agent
DeepMind recently published a paper on which they have presented a new external memory architecture (MERLIN) that is based on research from neuroscience. external memory enhance drastically the ability of the model to access relevant temporal memory far beyond LSTM capabilities.
We propose MERLIN, an integrated AI agent architecture that acts in partially observed virtual reality environments and stores information in memory based on different principles from existing end-to-end AI systems: it learns to process high-dimensional sensory streams, compress and store them, and recall events with less dependence on task reward. We bring together ingredients from external memory systems, reinforcement learning, and state estimation (inference) models and combine them into a unified system using inspiration from three ideas originating in psychology and neuroscience: predictive sensory coding, the hippocampal representation theory of Gluck and Myers, and the temporal context model and successor representation
Soft actor-critic (SAC) algorithm form UC Berkeley and Google Brain:
Blog post: https://bair.berkeley.edu/blog/2018/12/14/sac/ Paper: https://drive.google.com/file/d/1J8gZXJN0RqH-TkTh4UEikYSy8AqPTy9x/view
Wow this is really impressive, this algo have some very nice proprieties in addition to showing great results.
@Kismuz do you have any thoughts to bring Soft actor-critic algorithm to BTGym?
github repo by berkeley
@JacobHanouna, yes in general, not at the moment - there are many exciting things could (and should) be done here: novel algorithms, network architectures, proper GPU support, live trading APIs, backtest parser presets for various types of assets to mention a few. But to my sole believe and partially due to limited resources (single head and pair of hands) those are secondary objectives to implementing at least single algorithmic solution which can be justified as 'stable performing' at least with out of sample backtest. At the moment implementing model-based mean-reverting pairs trading setup is my priority. I do think of features implemented in package as of 'least acceptable baseline' supporting blocks for such a research. After any of such result is established and proven to be effective one can go down and improve by refining base components as it is more like software engineering job.
Of course I do welcome any contribution regarding all the aspects mentioned.
Recent high-level review from JPMorgan research group: Idiosyncrasies and challenges of data driven learning in electronic trading
not state of the art per se but interesting blog Using the latest advancements in deep learning to predict stock price movements
one of the papers in the blog is also interesting Simple random search provides a competitive approach to reinforcement learning
While learning a bit about Meta-learning I came across the topic of Deep Neuroevolution which belong to the field of Genetic Algorithms.
Paper Repro: Deep Neuroevolution
Using Evolutionary AutoML to Discover Neural Network Architectures
Google/Deepmind's new paper "Learning Latent Dynamics for Planning from Pixels" https://github.com/google-research/planet
PlaNet is a purely model-based reinforcement learning algorithm that solves control tasks from images by efficient planning in a learned latent space. PlaNet competes with top model-free methods in terms of final performance and training time while using substantially less interaction with the environment.
https://openai.com/blog/glow/ https://arxiv.org/pdf/1807.03039.pdf https://arxiv.org/pdf/1605.08803.pdf https://www.researchgate.net/publication/15614030_An_Information-Maximization_Approach_to_Blind_Separation_and_Blind_Deconvolution https://www.cs.helsinki.fi/u/ahyvarin/papers/NN99.pdf