Open-Sora Training Loss doesn't go below 1

I use my own data and the stage1 training for v1.2. The loss doesn't go below 1. Do you have any suggestions why this happened?

Jul 02 '24 18:07 qian18long

I have a similar problem. I fine-tune open-sora v1.2 on my own dataset using a learning rate of [1e-4, 2e-5]. The loss curves always fluctuate around 1. After about 70k steps of training, I evaluate the model and, unfortunately, find the quality of the generated videos is not expected.

Jul 07 '24 11:07 tinnerhrhe

It is well likely due to the dataset. Can you probably follow our stage 1 first: https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_03.md#first-stage, before you use your own dataset to continue training?

Jul 08 '24 21:07 JThh

Maybe you can also check whether the pretrained model is loaded sucessfully. My training loss of SFT opensora v1.2 start at about 0.4 ~0.5 . But if ckpt not loaded, the loss starts about 4~5.

Jul 09 '24 05:07 CIntellifusion

@qian18long do you manage to solve your problem?

Jul 10 '24 02:07 FrankLeeeee

Thanks for your suggestions! The num_frames in my own videos ranges from 2 to 7, which is rather smaller compared to the videos used in Open-Sora. After carefully checking the code, I found that the original version only supports videos with more than 17 frames. Therefore, I revised the code related to temporal downsampling. I have attached the loss curve below. Is this curve normal or similar to what you observed during training? Screenshot 2024-07-10 at 12 43 40 PM

Additionally, I have one question about the timestep_transform function in rectified_flow: What are the effects of this function? Similarly, what are the effects of the scale arg in PositionEmbedding2D? Why you use them? Will they affect the performance if I remove them?

Jul 10 '24 04:07 tinnerhrhe

Thanks for your suggestions! The num_frames in my own videos ranges from 2 to 7, which is rather smaller compared to the videos used in Open-Sora. After carefully checking the code, I found that the original version only supports videos with more than 17 frames. Therefore, I revised the code related to temporal downsampling. I have attached the loss curve below. Is this curve normal or similar to what you observed during training?

Additionally, I have one question about the timestep_transform function in rectified_flow: What are the effects of this function? Similarly, what are the effects of the scale arg in PositionEmbedding2D? Why you use them? Will they affect the performance if I remove them?

I think this is similar to mine. But I haven't evaluated the my ft ckpt on the vbench yet.

Jul 10 '24 08:07 CIntellifusion

This issue is stale because it has been open for 7 days with no activity.

Jul 18 '24 01:07 github-actions[bot]

@tinnerhrhe Were you able to train with videos less than 17 frames? Can you please elaborate how you "revised the code related to temporal downsampling"? Thanks!

Jul 18 '24 22:07 Ir1d

@qian18long Hi, I also encountered a similar problem, have you solved this problem?

Jul 27 '24 13:07 xunguo18

This issue is stale because it has been open for 7 days with no activity.

Sep 15 '24 02:09 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Sep 22 '24 02:09 github-actions[bot]

Open-Sora Open-Sora copied to clipboard

Training Loss doesn't go below 1

Open-Sora
Open-Sora copied to clipboard