Nathan Lambert
Nathan Lambert
Hey! Post any questions or complaints on the dataset. We'll log our internal goals and limitations here too. 1. It was pointed out by [Rishabh Agarwal](https://agarwl.github.io/) that the PRM Math...
Some things to add: - [ ] Pareto distribution of any Section or Subset Comment anything else (or just watch my notes)
1. Make it so you can run inference over individual text prompts (rather than chosen + rejected) 2. Clean up nograd/detach (see https://twitter.com/shxf0072/status/1771220126655811610), but should be pretty obvious 3. Add...
See below! https://huggingface.co/models?library=nemo&search=RM This involves converting to HF format or adding nemo compatibility, if anyone has time!
With the human data AI2 has or a dataset like `no_robots`, we could test if a RM prefers the human or model answers to a completion.
1. Take a few chat models as the “base set”, say 1-3, like tulu 2 7b and tulu 2 13b (maybe olmo-instruct) 2. Generate ~8 completions per prompt in AlpacaEval...
Hey all! drop a comment if you want to contribute, introduce yourself if you want.
Some todo's, feel free to add more! - [ ] Add readme of examples - [ ] Convert examples to markdown, maybe put in folders and render with a Readme?
- [x] Leading spaces in text causes weird stuff before speaking, remove them - [x] Things like $1mil are hard to filter but don't work, maybe an LLM can rephrase...