Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Implement RM Mode for DatasetEntry

Open CloseChoice opened this issue 2 years ago • 1 comments

Currently the DatasetEntry class cannot be used for reward model training, and a NotImplementedError is thrown, see https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/formatting.py#L88. This should be fixed as soon as possible. The response structure can be viewed here.

  • [ ] support that answers can be of type list[list[str]] (since we need multiple answers for each question) (not sure that we need this)
  • [ ] implement support for DatasetEntry in RankingCollator
  • [ ] add test in test_formatting.py
  • [ ] optional: implement DatasetEntry support for WebGPT dataset

I take this issue for now, but feel free to join the effort.

CloseChoice avatar Apr 21 '23 21:04 CloseChoice

One thing I noticed is that we treat the max length different for the prefix and the suffix the RankingCollator, probably because it's better to have text lost at the beginning at the end of a conversation than in the the middle. But this is different to how we've implement this for a couple of SFT datasets (e.g. here). Maybe we need a seperate issue to align this. I'd prefer the RM approach.

CloseChoice avatar Apr 22 '23 05:04 CloseChoice

all relevant tasks are done.

CloseChoice avatar Apr 24 '23 12:04 CloseChoice