add noise scheduler
Hi,I try to finetune the model on my own dataset, but I am not very familiar with the diffusion model's trainning. I would like to ask where I can get the method/scheduler of adding noise in the code, because I noticed that the pred noise after denoise unet will go through complex calculations to get the predicted vae features of the video frame.
Hi, thanks for your attention. You can refer to this loss function (https://github.com/ali-vilab/UniAnimate/blob/549ee5fad7618500790929b0ae73151d36649045/tools/modules/diffusions/diffusion_ddim.py#L381) for more details.
hi, the model_kwargs look like model_kwargs of inference, can you tell me what x0 is, is gt_frame's vae encode_features?such as 1,sq,4,96,64.
Hi, sorry for the late reply. x0 means the original clean video vae latents. t is the timestep.
it does not matter,you are so kind,there are good news that I did fine-tune on my own dataset,one person's dancing video with 340 frames.I am just training ['local_image_embedding','local_image_embedding_after'] two blocks' parameters.Looking forward to good results. thanks very much.
hi,I tried to fine-tune a specific person's data, but the result was not very good. How should I modify it?I notice that the loss_type is 'mse', and var_type is 'fixed_small',is that right?
Hi, you are right. The loss_type is 'mse', and var_type is 'fixed_small'. If you want to transfer your model to other domains, maybe you need to collect more data to train this model. Or you can try to merely train a part of parameters. But I'm not sure it will work. Since I trained the model on ~10K videos, I didn't know what the model would do by finetuned on a few videos. I'm sorry about that.
hi,after testing, the model does not seem to perform well for side faces and turns. I think it is because the model cannot obtain the information of the clothes on the back through a front-facing image of a person. Assuming that I have the three-view information of the person, can I embed it in the ref_image list to let the model know this information? Thank you