trlx
trlx copied to clipboard
Benchmark suite
We should use RL4LMs benchmark suite, I think it is a strong candidate to show the strengths and weaknesses of TRLX.
Ideas for tasks
- web searching: wikipedia race
- chess
- A chess DT is not trained on natural language, it’s trained on a formal language encoding chess moves. So if your Stockfish provides feedback is the sequence of moves that refutes the line the DT was proposing, that is actually “natural language feedback” in the context of the toy task
- summarization
- sentiments
- HHH data
- GRUE benchmark
- Preference one model and train new model to see how long it takes to start imitating new model
- Out of the box usability: do I have to "prompt" my models less?
- How well does model incorporate feedback
Summarization Data with Human Feedback from OpenAI: https://github.com/openai/summarize-from-feedback#human-feedback-data
@Dahoas
@PhungVanDuy @Dahoas I'd love to help on the summarization task, what's the current status?
Duy has implemented it and has it working apparently.
@PhungVanDuy @Dahoas I'd love to help on the summarization task, what's the current status?
I have an implementation here, but it's pretty messy right now, I'm reviewing some results and doing hyperparameter tuning. If it worked well I will clean the code then.