SimPO
SimPO copied to clipboard
Confused about number of steps
Hi, I saw your training curve for Gemma 9b SimPO here: https://wandb.ai/yumeng0818/simpo/runs/4w25j650?nw=nwuseryumeng0818. How is it that there's only 92 steps? At 128 batch size, that would only be 11k total examples seen, but there's ~60k in the dataset. Thanks.
Hi @cinjon did you figure it out? It's confusing. Also the actual batch size seems to be 256 (2 * 8 * 16), so there should be about 232 steps for 1 epoch.
- Still confused but our training runs are reasonable, so I gave up trying to guess theirs.
- Yeah I was confused if it was 128 or 256.
- I'm also confused about their eval templates and scores on Gemma.
@cinjon I tried to use TRL's implementation (https://huggingface.co/docs/trl/cpo_trainer#simple-preference-optimization-simpo) for training runs, but I cannot reproduce their Gemma2-9B-it-SimPO model. The resulting model after 1 epoch on the dataset is so much worse. I noticed there is another PR by the authors to create a separate SimPOTrainer with TRL: https://github.com/huggingface/trl/pull/1725 I hope that can fix the issues.
Hi,
I think this is a wandb display issue -- please make sure you set the x-axis to be train/global_step (figure 2) instead of step (figure 1). The former shows the real training step while the latter shows the logging step (we logged once every 5 steps).
Best, Yu