Nathan Lambert
Nathan Lambert
I think I have updated all the unused parameters throughout documentation changes. RE @Hafiidz 's point, I think that is because the config on the hub is missing the entry?...
What's BDDM?
Thanks! Feel free to ping me for examples. May not respond immediately because we're all busy but would like to help.
We use a mix (which is a mess), here's an example with the records orient https://huggingface.co/datasets/allenai/reward-bench-results/blob/main/best-of-n/alpaca_eval/tulu-13b/OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5.json There are more in that folder, ~40mb maybe?
@albertvillanova here's a snippet so you don't need to click ``` { "config": "top_p=0.9;temp=1.0", "dataset_details": "helpful_base", "id": [ 0, 0 ], "model": "allenai/tulu-2-dpo-13b", "scores": 3.076171875 } { "config": "top_p=0.9;temp=1.0", "dataset_details":...
@yuchenlin is starting this, woohoo!
Partially closed in #30 , wrapping up soon.
Yeah, so something weird is going one with a simultaneous large drop in entropy, clip fraction, etc. Can we log the model outputs at that step? Is there any chance...
@younesbelkada your idea makes sense. Some follow ups: 1. @lvwerra what experiment setup was this? I'd love to dig further. 2. what does a clip frac of .55 mean, is...
I think having it in the readme too can be nice. The docs are closer with the various RL options too.