HandyRL
HandyRL copied to clipboard
HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
So far, the adoption rate in the replay buffer has been linear based on `maximum_episodes`, but this means that the earliest episodes will be selected many times before the buffer...
Computing Nash equilibrium looks wonderful, but there are some required updates.
This is just one possible idea, especially for large scale training.
How to define `rho` and `c` has not a clear answer. However, in a game like rock-paper-scissors where the best move depends on the opponent's move, it makes no sense...
The learning rate proportional to batch size looks strange.
For example: ``` if args.get('show', False): self.env.render() ```
- board view - better neural net? - preparation for piece color estimation>
This is more generalized version.
In the future, I'd like to remove `prepare_env` and do entries from each `Gather`, but first I have an idea to remove an useless port opend by the `entry_server`.