following-instructions-human-feedback icon indicating copy to clipboard operation
following-instructions-human-feedback copied to clipboard

Where to find the experiment comparation: Using the data of training reward model for fine-tuning without reinforcement learning.

Open guotong1988 opened this issue 1 year ago • 0 comments

Thank you very much!

guotong1988 avatar Apr 18 '23 03:04 guotong1988