Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

OA dataset consistent splits #1661

Open bethanyconnolly opened this issue 1 year ago • 4 comments

bethanyconnolly avatar Feb 26 '23 17:02 bethanyconnolly

:x: pre-commit failed. Please run pre-commit run --all-files locally and commit the changes. Find more information in the repository's CONTRIBUTING.md

github-actions[bot] avatar Feb 26 '23 17:02 github-actions[bot]

Not exactly what we had in mind, can you check again #1661?

sanagno avatar Feb 27 '23 08:02 sanagno

Not exactly what we had in mind, can you check again #1661?

Hi, can you please clarify what you had in mind? In #1661 we discussed that the 'sft' and 'reward' datasets needed to be the same split but 'rl' should be different, which I think is what I have done but maybe I have misinterpreted what you wanted? Thanks!

bethanyconnolly avatar Feb 27 '23 15:02 bethanyconnolly

The reward training uses a different dataset to begin with, see here. Then we need to make sure the splits between these datasets are consistent, in case we want to avoid using the same data for the sft and reward training, which is still up for debate

sanagno avatar Mar 02 '23 10:03 sanagno