Yawen Duan

Results 8 comments of Yawen Duan

Here are some empirical results on 1): It seems that adding deterministic trajectories indeed helps to learn faster, but I guess if we want to turn PPO to deterministic, we...

> I don't think we want PPO to be deterministic. I agree. Pull request made in #423. > To check I understand the first figure -- higher exploration_frac corresponds to...

Thanks for flagging this out! I support very much adding this feature.

I'm a bit not sure what a good way should be to test the correctness or convergence of the EMA running stats. I'm mainly unsure about 1. How to set...

Thanks for the response! I would be happy to continue this discussion in this issue #540.

Yep, I think #546 resolved this. I will close this issue now.

Thanks for the implementations!

Is there any plans to expand the number of connectors of those cookbooks? Especially MLCommons AI Safety Benchmarks?