CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

Questions about lora training setting for CogVideoX 5B

Open aHapBean opened this issue 9 months ago • 5 comments

Hi,

When I trained CogVideoX 2B on 10,000 videos with a batch size of 16 (approximately 2k steps), using a learning rate of 1e-5 and a weight decay of 1e-4, the generated results were normal:

https://github.com/user-attachments/assets/006ab506-4f21-4dce-89cb-42f0615f5ea4

However, when I fine-tuned CogVideoX 5B LoRA on the same 10,000 videos with a batch size of 8 (around 2600 steps), using a learning rate of 2e-5 and weight decay of 1e-4, the generated results became abnormal (over-saturated). The LoRA rank is 128, and the LoRA alpha is set to 64:

https://github.com/user-attachments/assets/84c0563b-34aa-42ed-a580-37234c81e9cd

As the training progresses, this phenomenon becomes more pronounced.

The original generated results of 5B model:

https://github.com/user-attachments/assets/87b56a6e-5f9c-41a4-92b5-6ba975058ae3

I have researched this issue and found that the abnormal results might be caused by the learning rate and weight decay settings for the 5B LoRA model. Do you have any recommendations for adjusting these settings? Or could this issue be due to other factors?

aHapBean avatar Mar 27 '25 10:03 aHapBean

Additionally, I tried learning rates of 1e-3, 1e-4, 1e-5, 1e-6, and 1e-7, with a weight decay of 1e-3. Only the results for 1e-6 and 1e-7 were normal, but these values are too small for LoRA fine-tuning to effectively enable the model to learn new knowledge.

So weird.

aHapBean avatar Mar 28 '25 03:03 aHapBean

Hi, when I finetune the CogVideo T2V 5B mode using LoRA, my loss turns to nan after 1-2 steps. Have you faced this issue ? I have tried reducing learning rate.

Kartik-3004 avatar Mar 28 '25 16:03 Kartik-3004

Hi, when I finetune the CogVideo T2V 5B mode using LoRA, my loss turns to nan after 1-2 steps. Have you faced this issue ? I have tried reducing learning rate.

I haven't met this issue.

aHapBean avatar Mar 29 '25 04:03 aHapBean

Hi,

When I trained CogVideoX 2B on 10,000 videos with a batch size of 16 (approximately 2k steps), using a learning rate of 1e-5 and a weight decay of 1e-4, the generated results were normal:

2B.mp4 However, when I fine-tuned CogVideoX 5B LoRA on the same 10,000 videos with a batch size of 8 (around 2600 steps), using a learning rate of 2e-5 and weight decay of 1e-4, the generated results became abnormal (over-saturated). The LoRA rank is 128, and the LoRA alpha is set to 64:

5B.mp4 As the training progresses, this phenomenon becomes more pronounced.

The original generated results of 5B model:

5B_original.mp4 I have researched this issue and found that the abnormal results might be caused by the learning rate and weight decay settings for the 5B LoRA model. Do you have any recommendations for adjusting these settings? Or could this issue be due to other factors?

I also encountered the same problem when training CogvideoX1.5-i2v-5B Lora with a small number of videos. Especially evident in the first frame of the video. Do you have any good suggestions?

HuazhangHu avatar Apr 03 '25 03:04 HuazhangHu

Hi, When I trained CogVideoX 2B on 10,000 videos with a batch size of 16 (approximately 2k steps), using a learning rate of 1e-5 and a weight decay of 1e-4, the generated results were normal: 2B.mp4 However, when I fine-tuned CogVideoX 5B LoRA on the same 10,000 videos with a batch size of 8 (around 2600 steps), using a learning rate of 2e-5 and weight decay of 1e-4, the generated results became abnormal (over-saturated). The LoRA rank is 128, and the LoRA alpha is set to 64: 5B.mp4 As the training progresses, this phenomenon becomes more pronounced. The original generated results of 5B model: 5B_original.mp4 I have researched this issue and found that the abnormal results might be caused by the learning rate and weight decay settings for the 5B LoRA model. Do you have any recommendations for adjusting these settings? Or could this issue be due to other factors?

I also encountered the same problem when training CogvideoX1.5-i2v-5B Lora with a small number of videos. Especially evident in the first frame of the video. Do you have any good suggestions?

Currently no solution. May I see your generated videos?

aHapBean avatar Apr 03 '25 03:04 aHapBean