Machine-Learning-6.867-homework
Machine-Learning-6.867-homework copied to clipboard
policy gradient
Start thinking about policy gradient methods. What would we need to implement them? What should be the parametric form of the policy?