Yawen Duan comments

Results 8 comments of


                                            Yawen Duan

Randomness control for different `exploration_frac` in preference comparisons

Here are some empirical results on 1): It seems that adding deterministic trajectories indeed helps to learn faster, but I guess if we want to turn PPO to deterministic, we...

Randomness control for different `exploration_frac` in preference comparisons

> I don't think we want PPO to be deterministic. I agree. Pull request made in #423. > To check I understand the first figure -- higher exploration_frac corresponds to...

[Preference Comparison] L2 regularization with dynamic regularization coefficient

Thanks for flagging this out! I support very much adding this feature.

Optimize EMANorm by removing for loop over a batch

I'm a bit not sure what a good way should be to test the correctness or convergence of the EMA running stats. I'm mainly unsure about 1. How to set...

Optimize EMANorm by removing for loop over a batch

Thanks for the response! I would be happy to continue this discussion in this issue #540.

Set a reasonable default decay rate for EMANorm

Yep, I think #546 resolved this. I will close this issue now.

Building blocks for PEBBLE

Thanks for the implementations!

Benchmarking fails when using Azure OpenAI

Is there any plans to expand the number of connectors of those cookbooks? Especially MLCommons AI Safety Benchmarks?