alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

Benchmarking and optimization tips

Open 0xbitches opened this issue 1 year ago • 5 comments

Not exactly an issue, but have just been trying to run one epoch of finetuning with llama-13b. On a 4090 looks like it will take roughly 4 hours with the setting `MICRO_BATCH_SIZE = 2'.

However, it looks like the loss already converged to ~1 within epoch 0.12 (roughly 30 minutes into training), so it doesn't really make sense to use epoch=3 and potentially a larger micro batch size.

I could be wrong here. Happy to hear some feedback on how to better tune the parameters.

0xbitches avatar Mar 15 '23 21:03 0xbitches

Also, would be great if we can have 4bit support by incorporating GPTQ #2

0xbitches avatar Mar 15 '23 21:03 0xbitches

With 256 tokens the loss slowly pulls further down to somewhere slightly above 0.8. You could maybe get away with using 2 epochs instead of 3, though.

tloen avatar Mar 16 '23 04:03 tloen

With 256 tokens the loss slowly pulls further down to somewhere slightly above 0.8. You could maybe get away with using 2 epochs instead of 3, though.

Yeah I definitely saw it drop below 0.75 somewhere between epoch 1-2. Could still achieve pretty good loss with just one epoch though. Was testing this in a hurry so just sharing this information here.

0xbitches avatar Mar 16 '23 04:03 0xbitches

did you get below 0.75 w current hyperparams? I wasnt able to get under 0.8 . wondering what others are getting (Im using A100 40GB)

kesar avatar Mar 16 '23 16:03 kesar

I probably wouldn't anchor too much on the specific loss numbers until we've refactored the training code to use validation sets.

tloen avatar Mar 16 '23 17:03 tloen