Open-Assistant
Open-Assistant copied to clipboard
OA dataset consistent splits #1661
:x: pre-commit failed.
Please run pre-commit run --all-files
locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md
Not exactly what we had in mind, can you check again #1661?
Not exactly what we had in mind, can you check again #1661?
Hi, can you please clarify what you had in mind? In #1661 we discussed that the 'sft' and 'reward' datasets needed to be the same split but 'rl' should be different, which I think is what I have done but maybe I have misinterpreted what you wanted? Thanks!
The reward training uses a different dataset to begin with, see here. Then we need to make sure the splits between these datasets are consistent, in case we want to avoid using the same data for the sft and reward training, which is still up for debate