Nathan Lambert
Nathan Lambert
I'm not a VLLM contributor (at least heavily, I may have had a PR I don't remember), but I'm a heavy reward model user and a heavy infrastructure builder (you...
@zhuzilin I think the initial implementation is good at a quick pass. It covers the biggest things. (mostly acknowledging that I did, but without using it I am unlikely to...
Hey @RobinsonKO -- looking into this. For example the sort of command ran with is this -- **note I didn't check the exact hyperparameters, copied from one of the SFT...
Can you say more @RobinsonKO ? I bet the OLMES repo has become outdated to our setup a bit (you can see its not updated often), so it'll be hard...
@lyybonnie can you say more? Link 404s