generative-models icon indicating copy to clipboard operation
generative-models copied to clipboard

ADD student's timesteps

Open lbdave94 opened this issue 1 year ago • 6 comments

Hi! I was reading your paper "Adversarial Diffusion Distillation" (great work!), but it's not clear to me how to set the timesteps for the student. You affirm that T_student = {t_1, ..., t_n} of N chosen timesteps, where N = 4 and that t_n must be equal to 1000 to enforce zero-terminal SNR. But then how the other steps are decided? Randomly between [1,1000], uniformly between [1,1000] or you choose them experimentally ?

Moreover, how long it takes the student to converge? Can you please share more training information?

lbdave94 avatar Dec 05 '23 15:12 lbdave94

I have the same question. The paper does not specify which 4 time steps it used for training the student.

betterze avatar Jan 11 '24 00:01 betterze

@lbdave94 Have you found the answers? thx a lot

betterze avatar Jan 11 '24 00:01 betterze

@betterze hi, no I didn't. IMO they should be accurately picked with special attention to noise level. Since the student will learn to generate the image in a single step, starting from noise, it is likely that the image will have artifacts, so it's convenient have a first level of noise close to the maximum, but that still allows some of the information to pass (e.g. 700-600). The other 2 level of noise could be smaller, in way to just refine the image. So the steps can be something like {1000, 600, 200, 50}.

lbdave94 avatar Jan 12 '24 10:01 lbdave94

I think the timesteps are evenly distributed. Taking T = 1000 for example, if there are four timesteps to sample from for the student, it would be 999, 749, 499, 249. Because in the inference phase, it uses a standard "EulerDiscreteScheduler" sampler, and it would make sense to have the evenly distributed time steps if you set inference steps to 4.

ascust avatar Jan 29 '24 10:01 ascust

Hi, the paper says that the step size of the teacher model is set to 1. I think this is unreasonable. I tried to use ddpm CIFAR10 to conduct ADD experiments. When the teacher model step size is 1, sampling is performed, and the result is a picture of completely random noise. Or is it that their teacher model is already sufficient to generate higher quality images in step 1?

digbangbang avatar Feb 22 '24 08:02 digbangbang

@digbangbang the teacher model is a classic DM that has learned already to go from pure noise to clean image. It's correct that it has 1000 steps. The problem with the teacher is that is too slow in inference, that's why they want to train a student to get the same result using just 4 steps.

lbdave94 avatar Feb 22 '24 09:02 lbdave94