Nathan Lambert

Results 60 issues of Nathan Lambert

Hey! Post any questions or complaints on the dataset. We'll log our internal goals and limitations here too. 1. It was pointed out by [Rishabh Agarwal](https://agarwl.github.io/) that the PRM Math...

question

Some things to add: - [ ] Pareto distribution of any Section or Subset Comment anything else (or just watch my notes)

1. Make it so you can run inference over individual text prompts (rather than chosen + rejected) 2. Clean up nograd/detach (see https://twitter.com/shxf0072/status/1771220126655811610), but should be pretty obvious 3. Add...

enhancement
good first issue

See below! https://huggingface.co/models?library=nemo&search=RM This involves converting to HF format or adding nemo compatibility, if anyone has time!

enhancement
New Model

With the human data AI2 has or a dataset like `no_robots`, we could test if a RM prefers the human or model answers to a completion.

enhancement

1. Take a few chat models as the “base set”, say 1-3, like tulu 2 7b and tulu 2 13b (maybe olmo-instruct) 2. Generate ~8 completions per prompt in AlpacaEval...

Hey all! drop a comment if you want to contribute, introduce yourself if you want.

Some todo's, feel free to add more! - [ ] Add readme of examples - [ ] Convert examples to markdown, maybe put in folders and render with a Readme?

documentation

- [x] Leading spaces in text causes weird stuff before speaking, remove them - [x] Things like $1mil are hard to filter but don't work, maybe an LLM can rephrase...