Nathan Lambert

Results 60 issues of Nathan Lambert

What's the right place to add best of n sampling and compare its impact to some existing methods? Some references: * Discussed in [reward model scaling laws paper](https://arxiv.org/abs/2210.10760), * OpenAI...

enhancement

Will share results, but experiments for #101 #122 #121

I'm comparing the PPO implementation to the OpenAI one and the [implementation details blog post](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) that goes through it. Wondering if some of these things improve performance. If not, it's...

question

Two changes: 1. Pass the optimizer in the sentiment example (currently variable was not passed into trainier). 2. [I think] fix the kwarg option for wandb config of `Accelerate`. See...

In the toxicity [script](https://github.com/lvwerra/trl/blob/b75d83ab28b59307916beb425207d46406502f11/examples/summarization/scripts/reward_summarization.py) should the `optimizer` be passed to the PPOTrainer -- or omitted? Found this because I'm dealing with optimizer setup for H4 by copying the code over....

Installing basic from source with pip does not install the quality / style requirements.

Essentially, how do we do this for a packaged Simulate environment? E.g. https://stackoverflow.com/questions/52727233/how-can-i-register-a-custom-environment-in-openais-gym

Collider-meshes for non-convex polygons currently require re-building a polygon out of invisible components or an advanced integration of a V-HACD algorithm for re-constructing a non-convex mesh as a convex set...

enhancement

Not sure the best way to handle this in `setup.py`

bug