nebuly Input/completion in reward training vs. RL training

Input/completion in reward training vs. RL training

Open menandro opened this issue 1 year ago • 1 comments

Why is it in reward training, the input and completion is appended as: user_input + " " + completion (reward.py line 254) where as in RL training, the equivalent task_response is: input + "\n" + completion (trainer.py line 680)?

Mar 10 '23 09:03 menandro

Hi @menandro, thank you for reporting the mismatch! I think it's just a typo that we need to fix as soon as possible. Would you mind opening a PR to fix this problem? I think it makes much more sense to use the formula we use in reward.py.

Mar 10 '23 10:03 diegofiori

nebuly nebuly copied to clipboard

Input/completion in reward training vs. RL training

nebuly
nebuly copied to clipboard