qlora icon indicating copy to clipboard operation
qlora copied to clipboard

Loss spike during training phase

Open yzxyzh opened this issue 1 year ago • 4 comments

Hi!

I'm trying to experiment you lora on different datasets. I observe that the loss is generally decreasing for each test I am performing, but also contain strange spikes. The loss value and decrease pattern is very similar with full-parameter fine-tune, thus I assume the final result should be fine, but those spikes are unexplainable to me. Similar spike has never been observed during full parameter fine-tuning. Any idea what has caused these spikes?

the spikes are marked with red boxes. Screen Shot 2023-05-30 at 8 14 03 AM

Thanks

yzxyzh avatar May 30 '23 00:05 yzxyzh

Hi! I meet the same problem

image

baibaiw5 avatar May 30 '23 00:05 baibaiw5

Same here, but the trained model seems fine? (sort of)

Maxwell-Lyu avatar May 30 '23 14:05 Maxwell-Lyu

because the qloar.py consume group_by_length.......

eggqq007 avatar May 31 '23 00:05 eggqq007

Yes, group_by_length batches examples with similar length together and creates the oscillating pattern in the loss that you observe. The batches with shorter examples and those with longer ones have different loss values as they tend to correlate with "difficulty".

Using this setting is not necessary but will improve finetuning efficiency by reducing padding.

artidoro avatar Jun 01 '23 16:06 artidoro