Yawen Duan
Yawen Duan
Here are some empirical results on 1): It seems that adding deterministic trajectories indeed helps to learn faster, but I guess if we want to turn PPO to deterministic, we...
> I don't think we want PPO to be deterministic. I agree. Pull request made in #423. > To check I understand the first figure -- higher exploration_frac corresponds to...
Thanks for flagging this out! I support very much adding this feature.
I'm a bit not sure what a good way should be to test the correctness or convergence of the EMA running stats. I'm mainly unsure about 1. How to set...
Thanks for the response! I would be happy to continue this discussion in this issue #540.
Yep, I think #546 resolved this. I will close this issue now.
Thanks for the implementations!
Is there any plans to expand the number of connectors of those cookbooks? Especially MLCommons AI Safety Benchmarks?