Thomas Spooner
Thomas Spooner
Hey, You're absolutely right. I'm not sure why, but this seems to be yet another discrepancies between my private development repo and this public one - as in your issue...
Honestly, I stopped using multi-threaded training quite some time before the main results of the paper were found. It doesn't surprise me much that it is broken. I realise that's...
Yeah, `OnlineRLearn` is the on-policy R-learning algorithm that was introduced by Sutton. It's the equivalent of Q-learning for continuing tasks - i.e. it solves for a different objective: the expected...
Hey @dichen9412 and @mbasso! Sorry for the delayed response - been very busy with follow up work. First off, I appreciate that there is a lack of documentation with this...