RL4LMs
RL4LMs copied to clipboard
A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?
For text gereration.
Thank you very much!
For text gereration.
Thank you very much!