trl Fix GPT2 sentiment notebook reward

Fix GPT2 sentiment notebook reward

Open cemiu opened this issue 8 months ago • 4 comments

I tried reproducing the notebook, but the model's performance barely improved, and after a bit of digging I found the issue.

The sentiment pipeline used to produce output in the order: [NEGATIVE, POSITIVE], whereas now the higher confidence class always comes first:

# before
[[{'label': 'NEGATIVE', 'score': -2.2947897911071777},
  {'label': 'POSITIVE', 'score': 2.557039737701416}]]

# now
[[{'label': 'POSITIVE', 'score': 2.557039737701416},
  {'label': 'NEGATIVE', 'score': -2.2947897911071777}]]

It used to select positive sentiment by index, but is now practically random. I've adjusted the training loop and eval functions to select positive sentiment again.

Jun 14 '24 17:06 cemiu

trl trl copied to clipboard

Fix GPT2 sentiment notebook reward

trl
trl copied to clipboard