UniAnimate add noise scheduler

Hi,I try to finetune the model on my own dataset, but I am not very familiar with the diffusion model's trainning. I would like to ask where I can get the method/scheduler of adding noise in the code, because I noticed that the pred noise after denoise unet will go through complex calculations to get the predicted vae features of the video frame.

Jul 02 '24 13:07 ak01user

Hi, thanks for your attention. You can refer to this loss function (https://github.com/ali-vilab/UniAnimate/blob/549ee5fad7618500790929b0ae73151d36649045/tools/modules/diffusions/diffusion_ddim.py#L381) for more details.

Jul 02 '24 15:07 wangxiang1230

hi, the model_kwargs look like model_kwargs of inference, can you tell me what x0 is, is gt_frame's vae encode_features?such as 1,sq,4,96,64.

Jul 03 '24 06:07 ak01user

Hi, sorry for the late reply. x0 means the original clean video vae latents. t is the timestep.

Jul 04 '24 11:07 wangxiang1230

it does not matter,you are so kind,there are good news that I did fine-tune on my own dataset,one person's dancing video with 340 frames.I am just training ['local_image_embedding','local_image_embedding_after'] two blocks' parameters.Looking forward to good results. thanks very much.

Jul 04 '24 15:07 ak01user

hi,I tried to fine-tune a specific person's data, but the result was not very good. How should I modify it?I notice that the loss_type is 'mse', and var_type is 'fixed_small',is that right?

Jul 08 '24 06:07 ak01user

Hi, you are right. The loss_type is 'mse', and var_type is 'fixed_small'. If you want to transfer your model to other domains, maybe you need to collect more data to train this model. Or you can try to merely train a part of parameters. But I'm not sure it will work. Since I trained the model on ~10K videos, I didn't know what the model would do by finetuned on a few videos. I'm sorry about that.

Jul 10 '24 15:07 wangxiang1230

hi,after testing, the model does not seem to perform well for side faces and turns. I think it is because the model cannot obtain the information of the clothes on the back through a front-facing image of a person. Assuming that I have the three-view information of the person, can I embed it in the ref_image list to let the model know this information? Thank you

Aug 12 '24 06:08 ak01user