DMT
DMT copied to clipboard
DMTimg pre-training ablation
From the paper also, in the 4.2 ablation study, I don't find any clear element why the network need to be trained in two stages (DMTimg and DMTvid) especially cause you have claimed:
Therefore, we train DMTvid from scratch instead of fine-tuning it on DMTimg
Was there any specific issue to unify the loss to a have an E2E single stage approach?