Nathan Lambert comments

Results 175 comments of


                                            Nathan Lambert

Add `v`-prediction

Just wanted to update everyone in the issue that @bglick13 mostly and some of I made a lot of progress on this in #818 -- feel free to take a...

[Feature Request] Upload Dynamics Models to the HuggingFace Hub

@luisenp do you have checkpoints of the models used in the benchmarking efforts of the paper? Or do I have to re-train to get them. Thanks for checking! (those will...

[Feature Request] Upload Dynamics Models to the HuggingFace Hub

Generally my goal for this PR would be do get the code working (actually the easy part), and say you can easily run a couple agents that are pretrained with...

[Feature Request] Upload Dynamics Models to the HuggingFace Hub

Closed with #178

Add Best of N sampling

Yeah @lvwerra it would just be an example / documentation addition I bet. Or, a more advanced option would be to explain the differences a bit for people too.

Haven't done it, happy to review your PR if you make one. Generally, I had written out psuedo code here ``` query_tensors = [query_tensor]*batch_size model.generate(query_tensors, return_prompt=training_args.return_prompt, generation_config=generation_config, ) batch["response"] =...

Nathan Lambert

Add `v`-prediction

[Feature Request] Upload Dynamics Models to the HuggingFace Hub

[Feature Request] Upload Dynamics Models to the HuggingFace Hub

[Feature Request] Upload Dynamics Models to the HuggingFace Hub

Add Best of N sampling

Add Best of N sampling

PPO Questions

PPO Questions

PPO Questions

PPO Questions