trl icon indicating copy to clipboard operation
trl copied to clipboard

Fix GPT2 sentiment notebook reward

Open cemiu opened this issue 8 months ago • 4 comments

I tried reproducing the notebook, but the model's performance barely improved, and after a bit of digging I found the issue.

The sentiment pipeline used to produce output in the order: [NEGATIVE, POSITIVE], whereas now the higher confidence class always comes first:

# before
[[{'label': 'NEGATIVE', 'score': -2.2947897911071777},
  {'label': 'POSITIVE', 'score': 2.557039737701416}]]

# now
[[{'label': 'POSITIVE', 'score': 2.557039737701416},
  {'label': 'NEGATIVE', 'score': -2.2947897911071777}]]

It used to select positive sentiment by index, but is now practically random. I've adjusted the training loop and eval functions to select positive sentiment again.

cemiu avatar Jun 14 '24 17:06 cemiu