h2o-llmstudio icon indicating copy to clipboard operation
h2o-llmstudio copied to clipboard

[CODE IMPROVEMENT] Check default RLHF parameters

Open maxjeblick opened this issue 2 years ago • 3 comments
trafficstars

🔧 Proposed code refactoring

Check if our default hyperparameters (e.g. kl_target) are correct, see: https://github.com/lvwerra/trl/commit/b56e8b327733baa81c3ef0d6508f08e1b3e33939 and https://github.com/lvwerra/trl/issues/462

Also, RLHF training is quite unstable w.r.t. parameter choices, see e.g. issues in trl. Try to find good defaults that work for one (or more) of our finetuned models.

maxjeblick avatar Jun 23 '23 14:06 maxjeblick

target_kl is unused currently. No early stopping based on this parameter. Larger minibatches as default sounds good, I got the same impression w.r.t. stability there. We need another logic then to make it work on larger models without causing OOMs. The rollout is currently done in a single batched forward pass.

pascal-pfeiffer avatar Jun 26 '23 06:06 pascal-pfeiffer

target_kl is unused currently. No early stopping based on this parameter.

It is used in AdaptiveKLController? (as kl_target).

maxjeblick avatar Jun 26 '23 07:06 maxjeblick

yes, these are two different params. Just wanted to make sure we talk about the same one. One is about early stopping (which you also linked) and the other one is the target for the controller (which is the interesting one for us).

pascal-pfeiffer avatar Jun 26 '23 11:06 pascal-pfeiffer