Khanh Nguyen
Results
3
comments of
Khanh Nguyen
Thanks, great job!
Always returning tail sums may not be optimal. Sometimes you want to do something with reward like rescaling, injecting noise, exponentiate, etc., It think it is better to return per-step...
It works, please commit it!