Nathan Lambert issues

Results 60 issues of


                                            Nathan Lambert

Add Best of N sampling

What's the right place to add best of n sampling and compare its impact to some existing methods? Some references: * Discussed in [reward model scaling laws paper](https://arxiv.org/abs/2210.10760), * OpenAI...

enhancement

[WIP] RL tweaks for stability & learning

Will share results, but experiments for #101 #122 #121

PPO Questions

I'm comparing the PPO implementation to the OpenAI one and the [implementation details blog post](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) that goes through it. Wondering if some of these things improve performance. If not, it's...

question

Small changes when integrating into H4

Two changes: 1. Pass the optimizer in the sentiment example (currently variable was not passed into trainier). 2. [I think] fix the kwarg option for wandb config of `Accelerate`. See...

Extra code in toxicity example

In the toxicity [script](https://github.com/lvwerra/trl/blob/b75d83ab28b59307916beb425207d46406502f11/examples/summarization/scripts/reward_summarization.py) should the `optimizer` be passed to the PPOTrainer -- or omitted? Found this because I'm dealing with optimizer setup for H4 by copying the code over....

Make clear that `output_hidden_states` and `output_attentions` aren't implemented

Closed #447

Doesn't say how to install all extra's in readme/ docs

Installing basic from source with pip does not install the quality / style requirements.

How to register an environment with openAI gym?

Essentially, how do we do this for a packaged Simulate environment? E.g. https://stackoverflow.com/questions/52727233/how-can-i-register-a-custom-environment-in-openais-gym

Meshes for non-convex polygons

Collider-meshes for non-convex polygons currently require re-building a polygon out of invisible components or an advanced integration of a V-HACD algorithm for re-constructing a non-convex mesh as a convex set...

enhancement

[Requirements] PyQT5 incompatible with m1 macs

Not sure the best way to handle this in `setup.py`

bug