Xiangdong Zhang

Results 2 issues of Xiangdong Zhang

Hi. I’ve observed that the learning rate tends to be higher during fine-tuning when using more GPUs. For instance, the group with 4 GPUs (gradient_accumulation_steps=1) and the group with 2...

Hi, When I trained CogVideoX **2B** on 10,000 videos with a batch size of 16 (approximately 2k steps), using a **learning rate of 1e-5** and a **weight decay of 1e-4**,...