TrackDiffusion
TrackDiffusion copied to clipboard
May I ask how many iterations it takes for Stage 1 and Stage 2 respectively to converge on YouTube-VIS?
By the way, is it possible to train the model without Stage 1? Is the training able to converge?
moreover, i find that the Instance-Enhancer is missing in the training&inference codes. is it necessary for SVD-version?
Hi, @12shuai I want to confirm whether your basemodel is SVD or modelscope's T2V?
SVD