DMT icon indicating copy to clipboard operation
DMT copied to clipboard

DMTimg pre-training ablation

Open bhack opened this issue 2 years ago • 2 comments

From the paper also, in the 4.2 ablation study, I don't find any clear element why the network need to be trained in two stages (DMTimg and DMTvid) especially cause you have claimed:

Therefore, we train DMTvid from scratch instead of fine-tuning it on DMTimg

Was there any specific issue to unify the loss to a have an E2E single stage approach?

bhack avatar Jul 19 '23 19:07 bhack

Please refer to the previous sections (e.g., introduction, related work) to better understand the key motivation (deficiency awareness) of our paper. And we have made extensive efforts to employ an end-to-end unified network trained with both image and video objective functions and datasets. However, despite those efforts, achieving the current high performance using such approach remains a challenging task.

yeates avatar Jul 20 '23 02:07 yeates

Yes I understand the deficency awareness issues. So the main issue it seems to me, also if not explicitly available in the paper, is related to train it with a common objective function between deficency/generative and the more "classical" video inpainting right?

bhack avatar Jul 20 '23 11:07 bhack