Bahador Bakhshi
Bahador Bakhshi
Inspired by [these explanations](https://www.debugcn.com/en/article/42238722.html), the following implementation works fine for me. 1) Define the `_observation_spec` as a dictionary that contains the actual observation and also the mask of the valid...
Have a look at "Learning Rate Scheduling" in the "Hands On ..." book
Except for the small additional computational cost, Expected Sarsa may completely dominate both of the other more-well-known TD control algorithms.
https://arxiv.org/pdf/1803.09820.pdf