Tune-A-Video Question about training loss.

Question about training loss.

Open Guanys-dar opened this issue 1 year ago • 2 comments

Thank you for your excellent work, which has been very inspiring to me.

I have some questions about the loss function used for fine-tuning your network in the context of your paper. In the paper, you mentioned using 'the same training objective in standard LDMs' during fine-tuning. However, in Figure 4 of the paper, it is stated that the network uses a pixel-wise reconstruction loss, which seems to compute based on the input video and the reconstructed video instead of the predicted noise. Could you please clarify if I am misunderstanding something?

Sep 03 '23 08:09 Guanys-dar

他的finetune网络估计就是这样训练的

Feb 18 '24 08:02 henbucuoshanghai

I have a same question!!!!! please help!

Oct 02 '24 14:10 DI-LEE

Tune-A-Video Tune-A-Video copied to clipboard

Question about training loss.

Tune-A-Video
Tune-A-Video copied to clipboard