rl_algorithms
rl_algorithms copied to clipboard
G-learning, test with infinite horizon
It turns out that the G-learning paper doesn't use the episodic setting (at least for the cliff-world setting, which is my main concern). Let's write a new cliff-world environment which isn't episodic and see if this matches their results.