video-diffusion-pytorch icon indicating copy to clipboard operation
video-diffusion-pytorch copied to clipboard

the commitment: one more residual

Open liangbingzhao opened this issue 1 year ago • 23 comments

After updating this commitment, the color of the sampled video fades(not know why); i am using the ucf101 dataset, unconditional training with a 10k step warmup.

liangbingzhao avatar Jul 21 '22 14:07 liangbingzhao

Were you able to get UCF101 unconditional training to work? I am getting moving colored patches for the sampled video

dhruv-nathawani avatar Jul 28 '22 18:07 dhruv-nathawani

Were you able to get UCF101 unconditional training to work? I am getting moving colored patches for the sampled video

yeah, just adjust unet and lr should work. I just use the ApplyEyeMakeup for training. If you use the whole dataset, maybe train longer? I haven't tested on the whole yet.

liangbingzhao avatar Jul 29 '22 01:07 liangbingzhao

I am trying whole dataset and it gives random colored artifacts

dhruv-nathawani avatar Jul 29 '22 18:07 dhruv-nathawani

is your loss converged? and maybe train longer? I train with eyemakeup folder, and need 60-70k to get good 6464 results, need 150+k to get good 128128 results. And the longer the better. so maybe you should let it train longer? I guess?

liangbingzhao avatar Aug 11 '22 06:08 liangbingzhao

Okay makes sense! I trained, batch_size = 64 on 8 A100 GPU for 20k iterations on eyemakeup folder (64x64) and got some results but not great. LR = 1e-03, also implemented skip 1 frame like the original paper.

It takes more than 2-3 days to get to 20k iterations, this is super super slow !!!

Do you mind sharing what U-Net modifications, LR and batch_size you have used?

dhruv-nathawani avatar Aug 11 '22 06:08 dhruv-nathawani

lr = 1e-4, batch_size=8(2 1080ti), 10k cosine warmup, model = Unet3D(dim = 64,dim_mults = (1, 2, 4, 8, 8)) prob_focus_present I don't find matter a lot, num it 0.0 or 0.2 produce similar quality results you give it a try?

liangbingzhao avatar Aug 11 '22 06:08 liangbingzhao

well, it shouldn't be that slow. I can train 70k using less than 2 days.(with DP accelerate)

liangbingzhao avatar Aug 11 '22 06:08 liangbingzhao

Thank you, I can try these settings on 16 A100 GPUs for :

  1. eyemakeup folder
  2. entire ucf101 dataset

I also set prob_focus_present = 0.2

dhruv-nathawani avatar Aug 11 '22 06:08 dhruv-nathawani

well, it shouldn't be that slow. I can train 70k using less than 2 days.(with DP accelerate)

Oh really? I just use nn.DataParallel and run the model

Hmm wondering if I can try DP accelerate as well then

dhruv-nathawani avatar Aug 11 '22 06:08 dhruv-nathawani

Also your learning rate seems high compared to mine. Interesting 1e-04 for batch_size = 8

Original paper is 3e-04 for batch_size = 128

dhruv-nathawani avatar Aug 11 '22 06:08 dhruv-nathawani

well, it shouldn't be that slow. I can train 70k using less than 2 days.(with DP accelerate)

Did you use accumulate_gradient=2?

dhruv-nathawani avatar Aug 11 '22 06:08 dhruv-nathawani

aha, i am sorry, i mean using dataparallel to accelerate. sorry make you confused

liangbingzhao avatar Aug 11 '22 06:08 liangbingzhao

Also I guess batch_size = 8, 70k iterations is approx equivalent to batch_size = 64, 10k iterations

dhruv-nathawani avatar Aug 11 '22 06:08 dhruv-nathawani

Also your learning rate seems high compared to mine. Interesting 1e-04 for batch_size = 8

Original paper is 3e-04 for batch_size = 128

1e-4 is smaller than your 1e-3. and the paper mention their training settings?? maybe I should check it out. additionally, if you make your batch size smaller, shouldn't you make your lr smaller, not bigger?

liangbingzhao avatar Aug 11 '22 06:08 liangbingzhao

Yes, LR should be smaller but I was thinking more like linear scaling down (most blogs recommend)

https://stackoverflow.com/questions/53033556/how-should-the-learning-rate-change-as-the-batch-size-change

dhruv-nathawani avatar Aug 11 '22 06:08 dhruv-nathawani

yeah, i'm just first 10k cosine warmup, then also linear down from 1e-4. I tried some bigger lr(5e-4) when producing 128*128, finding that really hard to converge, and results are bad. Hope these can help you.

liangbingzhao avatar Aug 11 '22 07:08 liangbingzhao

Yes definitely, thank you!

Also did you try "l1" vs "l2" loss? I noticed "l2" gives better result (also used in original paper)

dhruv-nathawani avatar Aug 11 '22 07:08 dhruv-nathawani

I use l1, not have time to try l2, I will give it a try!

liangbingzhao avatar Aug 11 '22 07:08 liangbingzhao

Thanks, I will share some samples once I get good results!

dhruv-nathawani avatar Aug 11 '22 07:08 dhruv-nathawani

looking forward to your results~

liangbingzhao avatar Aug 11 '22 07:08 liangbingzhao

btw, where do you find their training setting? I check the paper and the github.io, not finding them 😢

liangbingzhao avatar Aug 11 '22 07:08 liangbingzhao

Its on page 14 of the supplemental section ("Details and Hyper-parameters") of the paper

dhruv-nathawani avatar Aug 11 '22 07:08 dhruv-nathawani

oh, damn. I read and preserve their first version paper, not knowing their update. thx!!

liangbingzhao avatar Aug 11 '22 07:08 liangbingzhao

@dhruv-nathawani have you achieve any good results? I selected 10 categories of ucf101 to train a 64*64 model, batch_size = 8. After training 600K steps, got few good results.

liangbingzhao avatar Aug 27 '22 07:08 liangbingzhao

@martinriven can you show us some of your results?

oxjohanndiep avatar Aug 28 '22 22:08 oxjohanndiep

@martinriven I got good results for ApplyEyeMakeup, but when I tried training with 3 different video actions from UCF101 it did not work, would you mind sharing your architecture (parameters we discussed earlier) and results?

Here are some results for ApplyEyeMakeup (they seem to have overfit, which is expected training with just 145 videos) :

114 164 168

dhruv-nathawani avatar Aug 29 '22 17:08 dhruv-nathawani