alignment-handbook SFT lora ends with higher loss

SFT lora ends with higher loss

Open Randl opened this issue 1 year ago • 1 comments

I've run the training without changing any hyperparameter except for batch size and gradient accumulation steps to match the global batch size on two machines. The first run is exactly as in repo, gets eval loss 1.0667: https://wandb.ai/evgeniizh/huggingface/runs/pskgg48d The second one adds warmup (https://github.com/huggingface/alignment-handbook/pull/31 https://github.com/huggingface/alignment-handbook/pull/71) and uses TRL from master (which fixes https://github.com/huggingface/alignment-handbook/issues/61) and gets eval loss of 1.0927 https://wandb.ai/evgeniizh/huggingface/runs/9ez7kl7s

The official SFT model gets much lower loss of 0.99 https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora

Dec 09 '23 09:12 Randl

Possibly related to https://github.com/huggingface/alignment-handbook/issues/45

Dec 12 '23 10:12 Randl

alignment-handbook alignment-handbook copied to clipboard

SFT lora ends with higher loss

alignment-handbook
alignment-handbook copied to clipboard