SimPO Confused about number of steps

Confused about number of steps

Open cinjon opened this issue 1 year ago • 4 comments

Hi, I saw your training curve for Gemma 9b SimPO here: https://wandb.ai/yumeng0818/simpo/runs/4w25j650?nw=nwuseryumeng0818. How is it that there's only 92 steps? At 128 batch size, that would only be 11k total examples seen, but there's ~60k in the dataset. Thanks.

Nov 25 '24 00:11 cinjon

Hi @cinjon did you figure it out? It's confusing. Also the actual batch size seems to be 256 (2 * 8 * 16), so there should be about 232 steps for 1 epoch.

Dec 09 '24 23:12 cchenv

Still confused but our training runs are reasonable, so I gave up trying to guess theirs.
Yeah I was confused if it was 128 or 256.
I'm also confused about their eval templates and scores on Gemma.

Dec 10 '24 06:12 cinjon

@cinjon I tried to use TRL's implementation (https://huggingface.co/docs/trl/cpo_trainer#simple-preference-optimization-simpo) for training runs, but I cannot reproduce their Gemma2-9B-it-SimPO model. The resulting model after 1 epoch on the dataset is so much worse. I noticed there is another PR by the authors to create a separate SimPOTrainer with TRL: https://github.com/huggingface/trl/pull/1725 I hope that can fix the issues.

Dec 10 '24 06:12 cchenv

Hi,

I think this is a wandb display issue -- please make sure you set the x-axis to be train/global_step (figure 2) instead of step (figure 1). The former shows the real training step while the latter shows the logging step (we logged once every 5 steps).

Best, Yu

Feb 16 '25 03:02 yumeng5

SimPO SimPO copied to clipboard

Confused about number of steps

SimPO
SimPO copied to clipboard