consistency_models icon indicating copy to clipboard operation
consistency_models copied to clipboard

Multistep sampling in Algorithm 1

Open liyy201912 opened this issue 2 years ago • 10 comments

Hi, thanks for the great work.

One question here, it seems that the function here below is different from Algorithm 1 in the paper. Any further clarifications on multistep sampling? Thanks.

https://github.com/openai/consistency_models/blob/main/cm/karras_diffusion.py#L657

liyy201912 avatar Apr 15 '23 06:04 liyy201912

The stochastic sampler is the same, assuming you finish on t_min. In this case the scaling applied to the noise will be zero, so it finishes with the final denoised output.

thorinf avatar Apr 18 '23 15:04 thorinf

The stochastic sampler is the same, assuming you finish on t_min. In this case the scaling applied to the noise will be zero, so it finishes with the final denoised output.

Hi, I have the same problem. If you finish with t_min, then it is the same, but it seems that it is not necessary to always finish with t_min ? For example, if I use --sampler multistep --ts 0, 10,25, then I got very noisy results which, I think, is caused by the noise added in the Line 680 in karras_diffusion.py

Mingxiao-Li avatar May 04 '24 13:05 Mingxiao-Li

Been a while since I've been in depth with this code, so this may be a naive question. Are you sure ts should be in ascending order? Having the larger t last may mean that you're adding a lot of noise on the final sample step, whereas if its reversed then you don't add much noise and should be less noisy?

thorinf avatar May 04 '24 13:05 thorinf

Been a while since I've been in depth with this code, so this may be a naive question. Are you sure ts should be in ascending order? Having the larger t last may mean that you're adding a lot of noise on the final sample step, whereas if its reversed then you don't add much noise and should be less noisy?

I think the ts should in ascending order. I follow the instruction in scripts/launch.sh using ts= 0,22,39 for imagenet, and I can get good results. But if I change to ts=0, 22,25, I got very noisy image. When t=39 the scale of the noisy added to the imagein in the last step is quite small, however if t=25, then the scale become large. This might be the reason why I obtain these results. Is that possible that the stochastic sampler implemented in this codebase is incorrect ? It seems it is different from what is explained in the paper.

Mingxiao-Li avatar May 04 '24 19:05 Mingxiao-Li

The ordering is different in the code compared to the algorithm in paper, but they amount to the same thing:

    t = (t_max_rho + ts[0] / (steps - 1) * (t_min_rho - t_max_rho)) ** rho
    x = distiller(x0, t * s_in)
    for i in range(len(ts) - 1):
        next_t = (t_max_rho + ts[i] / (steps - 1) * (t_min_rho - t_max_rho)) ** rho
        next_t = np.clip(next_t, t_min, t_max)
        x = x + generator.randn_like(x) * np.sqrt(next_t**2 - t_min**2)
        t = (t_max_rho + ts[i] / (steps - 1) * (t_min_rho - t_max_rho)) ** rho
        x = distiller(x, t * s_in)
    return x

Is (roughly) how the paper orders it.

thorinf avatar May 04 '24 20:05 thorinf

Been a while since I've been in depth with this code, so this may be a naive question. Are you sure ts should be in ascending order? Having the larger t last may mean that you're adding a lot of noise on the final sample step, whereas if its reversed then you don't add much noise and should be less noisy?

I think the ts should in ascending order. I follow the instruction in scripts/launch.sh using ts= 0,22,39 for imagenet, and I can get good results. But if I change to ts=0, 22,25, I got very noisy image. When t=39 the scale of the noisy added to the imagein in the last step is quite small, however if t=25, then the scale become large. This might be the reason why I obtain these results. Is that possible that the stochastic sampler implemented in this codebase is incorrect ? It seems it is different from what is explained in the paper.

did you also change --steps from 40 to 26 in your case? I tested the diffuser code with: image = pipe(timesteps=[25, 22, 0], class_labels=class_id).images[0] (https://github.com/openai/consistency_models?tab=readme-ov-file#use-in--diffusers) and it works well.

yuanzhi-zhu avatar May 04 '24 21:05 yuanzhi-zhu

Been a while since I've been in depth with this code, so this may be a naive question. Are you sure ts should be in ascending order? Having the larger t last may mean that you're adding a lot of noise on the final sample step, whereas if its reversed then you don't add much noise and should be less noisy?

I think the ts should in ascending order. I follow the instruction in scripts/launch.sh using ts= 0,22,39 for imagenet, and I can get good results. But if I change to ts=0, 22,25, I got very noisy image. When t=39 the scale of the noisy added to the imagein in the last step is quite small, however if t=25, then the scale become large. This might be the reason why I obtain these results. Is that possible that the stochastic sampler implemented in this codebase is incorrect ? It seems it is different from what is explained in the paper.

did you also change --steps from 40 to 26 in your case? I tested the diffuser code with: image = pipe(timesteps=[25, 22, 0], class_labels=class_id).images[0] (https://github.com/openai/consistency_models?tab=readme-ov-file#use-in--diffusers) and it works well.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/karras_diffusion.py#L657

I believe steps doesn't make a difference in this implementation. What they want to do is single-step generation - the results won't be great but they should be distorted, not noisy

thorinf avatar May 04 '24 21:05 thorinf

Been a while since I've been in depth with this code, so this may be a naive question. Are you sure ts should be in ascending order? Having the larger t last may mean that you're adding a lot of noise on the final sample step, whereas if its reversed then you don't add much noise and should be less noisy?

I think the ts should in ascending order. I follow the instruction in scripts/launch.sh using ts= 0,22,39 for imagenet, and I can get good results. But if I change to ts=0, 22,25, I got very noisy image. When t=39 the scale of the noisy added to the imagein in the last step is quite small, however if t=25, then the scale become large. This might be the reason why I obtain these results. Is that possible that the stochastic sampler implemented in this codebase is incorrect ? It seems it is different from what is explained in the paper.

did you also change --steps from 40 to 26 in your case? I tested the diffuser code with: image = pipe(timesteps=[25, 22, 0], class_labels=class_id).images[0] (https://github.com/openai/consistency_models?tab=readme-ov-file#use-in--diffusers) and it works well.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/karras_diffusion.py#L657

I believe steps doesn't make a difference in this implementation. What they want to do is single-step generation - the results won't be great but they should be distorted, not noisy

I did not change steps, so in my experiments the steps is always 40. In the paper, the multistep sampling algorithm does not add noise to the sample after the last step, but in the stochastic_iterative_sample funciont , they did that. This is quite confusing. Also, the ts is ascending, then after the calculuaiton the t is actually descending, which makes sense for backward generation process.

Mingxiao-Li avatar May 04 '24 21:05 Mingxiao-Li

Been a while since I've been in depth with this code, so this may be a naive question. Are you sure ts should be in ascending order? Having the larger t last may mean that you're adding a lot of noise on the final sample step, whereas if its reversed then you don't add much noise and should be less noisy?

I think the ts should in ascending order. I follow the instruction in scripts/launch.sh using ts= 0,22,39 for imagenet, and I can get good results. But if I change to ts=0, 22,25, I got very noisy image. When t=39 the scale of the noisy added to the imagein in the last step is quite small, however if t=25, then the scale become large. This might be the reason why I obtain these results. Is that possible that the stochastic sampler implemented in this codebase is incorrect ? It seems it is different from what is explained in the paper.

did you also change --steps from 40 to 26 in your case? I tested the diffuser code with: image = pipe(timesteps=[25, 22, 0], class_labels=class_id).images[0] (https://github.com/openai/consistency_models?tab=readme-ov-file#use-in--diffusers) and it works well.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/karras_diffusion.py#L657

I believe steps doesn't make a difference in this implementation. What they want to do is single-step generation - the results won't be great but they should be distorted, not noisy

I find in the launch.sh, the multistep sampling always ends with 39, whch is the last step, then the scale of the added noise is actually zero.

Mingxiao-Li avatar May 04 '24 21:05 Mingxiao-Li