policy-gradient-importance-sampling
policy-gradient-importance-sampling copied to clipboard
Policy gradient reinforcement learning algorithm with importance sampling
policy-gradient-importance-sampling
Importance sampling is technique to estimate expectation of function under distribution p(x) with samples drawn from another distribution q(x). In policy gradient, this technique makes agent to use off-policy samples (samples T = (s1,a1,s2,a2,....) drawn from old policy) to update current policy. Thus, importance sampled policy gradient can reuses previous samples for training, ensures faster convergence.

Experiment on CartPole-v0 of gym shows importance sampling requires fewer episodes to learn optimal policy.
Requirements
- Python
- numpy
- pytorch
- gym
- matplotlib
Usage
$ python main.py (--reinforce to run REINFORCE example)