trl
trl copied to clipboard
Fix GPT2 sentiment notebook reward
I tried reproducing the notebook, but the model's performance barely improved, and after a bit of digging I found the issue.
The sentiment pipeline used to produce output in the order: [NEGATIVE, POSITIVE], whereas now the higher confidence class always comes first:
# before
[[{'label': 'NEGATIVE', 'score': -2.2947897911071777},
{'label': 'POSITIVE', 'score': 2.557039737701416}]]
# now
[[{'label': 'POSITIVE', 'score': 2.557039737701416},
{'label': 'NEGATIVE', 'score': -2.2947897911071777}]]
It used to select positive sentiment by index, but is now practically random. I've adjusted the training loop and eval functions to select positive sentiment again.