trlx icon indicating copy to clipboard operation
trlx copied to clipboard

Benchmark suite

Open LouisCastricato opened this issue 2 years ago • 5 comments

We should use RL4LMs benchmark suite, I think it is a strong candidate to show the strengths and weaknesses of TRLX.

LouisCastricato avatar Oct 06 '22 16:10 LouisCastricato

Ideas for tasks

  • web searching: wikipedia race
  • chess
    • A chess DT is not trained on natural language, it’s trained on a formal language encoding chess moves. So if your Stockfish provides feedback is the sequence of moves that refutes the line the DT was proposing, that is actually “natural language feedback” in the context of the toy task
  • summarization
  • sentiments
  • HHH data
  • GRUE benchmark
  • Preference one model and train new model to see how long it takes to start imitating new model
  • Out of the box usability: do I have to "prompt" my models less?
  • How well does model incorporate feedback

Dahoas avatar Oct 10 '22 17:10 Dahoas

Summarization Data with Human Feedback from OpenAI: https://github.com/openai/summarize-from-feedback#human-feedback-data

@Dahoas

PhungVanDuy avatar Oct 11 '22 19:10 PhungVanDuy

@PhungVanDuy @Dahoas I'd love to help on the summarization task, what's the current status?

thedch avatar Oct 31 '22 17:10 thedch

Duy has implemented it and has it working apparently.

LouisCastricato avatar Oct 31 '22 17:10 LouisCastricato

@PhungVanDuy @Dahoas I'd love to help on the summarization task, what's the current status?

I have an implementation here, but it's pretty messy right now, I'm reviewing some results and doing hyperparameter tuning. If it worked well I will clean the code then.

PhungVanDuy avatar Oct 31 '22 17:10 PhungVanDuy