video-diffusion-pytorch
video-diffusion-pytorch copied to clipboard
the commitment: one more residual
After updating this commitment, the color of the sampled video fades(not know why); i am using the ucf101 dataset, unconditional training with a 10k step warmup.
Were you able to get UCF101 unconditional training to work? I am getting moving colored patches for the sampled video
Were you able to get UCF101 unconditional training to work? I am getting moving colored patches for the sampled video
yeah, just adjust unet and lr should work. I just use the ApplyEyeMakeup for training. If you use the whole dataset, maybe train longer? I haven't tested on the whole yet.
I am trying whole dataset and it gives random colored artifacts
is your loss converged? and maybe train longer? I train with eyemakeup folder, and need 60-70k to get good 6464 results, need 150+k to get good 128128 results. And the longer the better. so maybe you should let it train longer? I guess?
Okay makes sense! I trained, batch_size = 64 on 8 A100 GPU for 20k iterations on eyemakeup folder (64x64) and got some results but not great. LR = 1e-03, also implemented skip 1 frame like the original paper.
It takes more than 2-3 days to get to 20k iterations, this is super super slow !!!
Do you mind sharing what U-Net modifications, LR and batch_size you have used?
lr = 1e-4, batch_size=8(2 1080ti), 10k cosine warmup, model = Unet3D(dim = 64,dim_mults = (1, 2, 4, 8, 8)) prob_focus_present I don't find matter a lot, num it 0.0 or 0.2 produce similar quality results you give it a try?
well, it shouldn't be that slow. I can train 70k using less than 2 days.(with DP accelerate)
Thank you, I can try these settings on 16 A100 GPUs for :
- eyemakeup folder
- entire ucf101 dataset
I also set prob_focus_present = 0.2
well, it shouldn't be that slow. I can train 70k using less than 2 days.(with DP accelerate)
Oh really? I just use nn.DataParallel and run the model
Hmm wondering if I can try DP accelerate as well then
Also your learning rate seems high compared to mine. Interesting 1e-04 for batch_size = 8
Original paper is 3e-04 for batch_size = 128
well, it shouldn't be that slow. I can train 70k using less than 2 days.(with DP accelerate)
Did you use accumulate_gradient=2?
aha, i am sorry, i mean using dataparallel to accelerate. sorry make you confused
Also I guess batch_size = 8, 70k iterations is approx equivalent to batch_size = 64, 10k iterations
Also your learning rate seems high compared to mine. Interesting 1e-04 for batch_size = 8
Original paper is 3e-04 for batch_size = 128
1e-4 is smaller than your 1e-3. and the paper mention their training settings?? maybe I should check it out. additionally, if you make your batch size smaller, shouldn't you make your lr smaller, not bigger?
Yes, LR should be smaller but I was thinking more like linear scaling down (most blogs recommend)
https://stackoverflow.com/questions/53033556/how-should-the-learning-rate-change-as-the-batch-size-change
yeah, i'm just first 10k cosine warmup, then also linear down from 1e-4. I tried some bigger lr(5e-4) when producing 128*128, finding that really hard to converge, and results are bad. Hope these can help you.
Yes definitely, thank you!
Also did you try "l1" vs "l2" loss? I noticed "l2" gives better result (also used in original paper)
I use l1, not have time to try l2, I will give it a try!
Thanks, I will share some samples once I get good results!
looking forward to your results~
btw, where do you find their training setting? I check the paper and the github.io, not finding them 😢
Its on page 14 of the supplemental section ("Details and Hyper-parameters") of the paper
oh, damn. I read and preserve their first version paper, not knowing their update. thx!!
@dhruv-nathawani have you achieve any good results? I selected 10 categories of ucf101 to train a 64*64 model, batch_size = 8. After training 600K steps, got few good results.
@martinriven can you show us some of your results?
@martinriven I got good results for ApplyEyeMakeup, but when I tried training with 3 different video actions from UCF101 it did not work, would you mind sharing your architecture (parameters we discussed earlier) and results?
Here are some results for ApplyEyeMakeup (they seem to have overfit, which is expected training with just 145 videos) :