Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

Training Loss doesn't go below 1

Open qian18long opened this issue 1 year ago • 9 comments

I use my own data and the stage1 training for v1.2. The loss doesn't go below 1. Do you have any suggestions why this happened?

Screenshot 2024-07-02 at 11 31 13 AM

qian18long avatar Jul 02 '24 18:07 qian18long

I have a similar problem. I fine-tune open-sora v1.2 on my own dataset using a learning rate of [1e-4, 2e-5]. The loss curves always fluctuate around 1. After about 70k steps of training, I evaluate the model and, unfortunately, find the quality of the generated videos is not expected.

tinnerhrhe avatar Jul 07 '24 11:07 tinnerhrhe

It is well likely due to the dataset. Can you probably follow our stage 1 first: https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_03.md#first-stage, before you use your own dataset to continue training?

JThh avatar Jul 08 '24 21:07 JThh

Maybe you can also check whether the pretrained model is loaded sucessfully. My training loss of SFT opensora v1.2 start at about 0.4 ~0.5 . But if ckpt not loaded, the loss starts about 4~5.

CIntellifusion avatar Jul 09 '24 05:07 CIntellifusion

@qian18long do you manage to solve your problem?

FrankLeeeee avatar Jul 10 '24 02:07 FrankLeeeee

Thanks for your suggestions! The num_frames in my own videos ranges from 2 to 7, which is rather smaller compared to the videos used in Open-Sora. After carefully checking the code, I found that the original version only supports videos with more than 17 frames. Therefore, I revised the code related to temporal downsampling. I have attached the loss curve below. Is this curve normal or similar to what you observed during training? Screenshot 2024-07-10 at 12 43 40 PM

Additionally, I have one question about the timestep_transform function in rectified_flow: What are the effects of this function? Similarly, what are the effects of the scale arg in PositionEmbedding2D? Why you use them? Will they affect the performance if I remove them?

tinnerhrhe avatar Jul 10 '24 04:07 tinnerhrhe

Thanks for your suggestions! The num_frames in my own videos ranges from 2 to 7, which is rather smaller compared to the videos used in Open-Sora. After carefully checking the code, I found that the original version only supports videos with more than 17 frames. Therefore, I revised the code related to temporal downsampling. I have attached the loss curve below. Is this curve normal or similar to what you observed during training? Screenshot 2024-07-10 at 12 43 40 PM

Additionally, I have one question about the timestep_transform function in rectified_flow: What are the effects of this function? Similarly, what are the effects of the scale arg in PositionEmbedding2D? Why you use them? Will they affect the performance if I remove them?

I think this is similar to mine. But I haven't evaluated the my ft ckpt on the vbench yet.

CIntellifusion avatar Jul 10 '24 08:07 CIntellifusion

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Jul 18 '24 01:07 github-actions[bot]

@tinnerhrhe Were you able to train with videos less than 17 frames? Can you please elaborate how you "revised the code related to temporal downsampling"? Thanks!

Ir1d avatar Jul 18 '24 22:07 Ir1d

@qian18long Hi, I also encountered a similar problem, have you solved this problem?

xunguo18 avatar Jul 27 '24 13:07 xunguo18

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Sep 15 '24 02:09 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Sep 22 '24 02:09 github-actions[bot]