Bahador Bakhshi

Results 4 comments of Bahador Bakhshi

Inspired by [these explanations](https://www.debugcn.com/en/article/42238722.html), the following implementation works fine for me. 1) Define the `_observation_spec` as a dictionary that contains the actual observation and also the mask of the valid...

Have a look at "Learning Rate Scheduling" in the "Hands On ..." book

Except for the small additional computational cost, Expected Sarsa may completely dominate both of the other more-well-known TD control algorithms.

https://arxiv.org/pdf/1803.09820.pdf