Tune-A-Video icon indicating copy to clipboard operation
Tune-A-Video copied to clipboard

Question about training loss.

Open Guanys-dar opened this issue 1 year ago • 2 comments

Thank you for your excellent work, which has been very inspiring to me. 

I have some questions about the loss function used for fine-tuning your network in the context of your paper. In the paper, you mentioned using 'the same training objective in standard LDMs' during fine-tuning. However, in Figure 4 of the paper, it is stated that the network uses a pixel-wise reconstruction loss, which seems to compute based on the input video and the reconstructed video instead of the predicted noise. Could you please clarify if I am misunderstanding something?

Guanys-dar avatar Sep 03 '23 08:09 Guanys-dar

他的finetune网络估计就是这样训练的

henbucuoshanghai avatar Feb 18 '24 08:02 henbucuoshanghai

I have a same question!!!!! please help!

DI-LEE avatar Oct 02 '24 14:10 DI-LEE