Open-Assistant
Open-Assistant copied to clipboard
Implement RM Mode for DatasetEntry
Currently the DatasetEntry class cannot be used for reward model training, and a NotImplementedError is thrown, see https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/formatting.py#L88. This should be fixed as soon as possible. The response structure can be viewed here.
- [ ] support that
answerscan be of typelist[list[str]](since we need multiple answers for each question) (not sure that we need this) - [ ] implement support for
DatasetEntryinRankingCollator - [ ] add test in
test_formatting.py - [ ] optional: implement
DatasetEntrysupport forWebGPTdataset
I take this issue for now, but feel free to join the effort.
One thing I noticed is that we treat the max length different for the prefix and the suffix the RankingCollator, probably because it's better to have text lost at the beginning at the end of a conversation than in the the middle. But this is different to how we've implement this for a couple of SFT datasets (e.g. here). Maybe we need a seperate issue to align this. I'd prefer the RM approach.
all relevant tasks are done.