SDF-Diffusion icon indicating copy to clipboard operation
SDF-Diffusion copied to clipboard

DDIM sampling implementation considerations

Open andreabosisio opened this issue 10 months ago • 6 comments

I'm not sure whether this is correct, as—when ddim_S > 1—the previous timestep in case of DDIM (t_prev = t - ddim_S) is not the same as the one of DDPM (t_prev = t - 1). https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L344

Also, I'd like to understand why the UNet model receives as input noise_level = self.sqrt_alphas_cumprod_prev[ts] instead of ts itself here https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L346

Thank you in advance for your collaboration

andreabosisio avatar Mar 03 '25 11:03 andreabosisio

Q1

I'm sorry, but I didn't quite understand your question. Could you please provide more details?

Q2

The variable ts is typically processed using something like positional embeddings since it is an integer and needs to be converted into a float. Some approaches use nn.Embedding, while others utilize sinusoidal positional embeddings. Using sqrt_alphas_cumprod_prev follows a similar principle.

Kitsunetic avatar Mar 03 '25 11:03 Kitsunetic

Q1

I'm sorry, but I didn't quite understand your question. Could you please provide more details?

self.sqrt_alphas_cumprod_prev is indexed for DDPM (i.e., t - t_prev = 1), meaning that self.sqrt_alphas_cumprod_prev[t] gives the the value at tilmestep t-1, whereas for DDIM one would need to have the value at tilmestep t-ddim_S.

andreabosisio avatar Mar 03 '25 16:03 andreabosisio

https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L341-L344

Based on sample_ddim function, the ts is sampled iteratively with a specific step size, where the step size would be larger than 1 when ddim_S > 1.

Kitsunetic avatar Mar 03 '25 23:03 Kitsunetic

Yes, I'm aware of that, but the issue I'm pointing out is this: imagine the current timestep is step=500, and according to the DDIM schedule, the previous timestep is t_prev=450. If I'm not mistaken, self.sqrt_alphas_cumprod_prev[step] would actually give the value at timestep 499, while the value we need is the one at timestep t_prev=450. This happens because self.sqrt_alphas_cumprod_prev is shifted by one timestep, as defined here https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L136 where alphas_cumprod is computed on the betas array, which is in turn defined as a DDPM schedule with step 1, as schedule_opt["n_timestep"] in https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L125-L127 contains the total number of DDPM timesteps.

andreabosisio avatar Mar 04 '25 08:03 andreabosisio

That’s interesting point. I think synchronizing sqrt_alphas_cumprod_prev when step=0 is important. For example, when DDPM step size is 1000 and DDIM step size is 20:

DDPM step prev DDIM step (current implementation) prev DDIM step (your implementation) prev
999 998 950 949 950 900
998 997 900 899 900 850
     
1 0 50 49 50 0
0 -1 0 -1 0 -50

Kitsunetic avatar Mar 04 '25 13:03 Kitsunetic

An easy way to handle the step=0 case is as they do in here https://github.com/huggingface/diffusers/blob/560fb5f4d65b8593c13e4be50a59b1fd9c2d9992/src/diffusers/schedulers/scheduling_ddim.py#L406.

Anyway, I think the issue is there also for all the other steps, right? I've just run it with n_timesteps=1000 and ddim_S=50, and at the first iteration, step=981 and the previous timestep (i.e., the next value in the time_range array) is 961, as expected. However, by how the self.sqrt_alphas_cumprod_prev array is defined, with self.sqrt_alphas_cumprod_prev[step] we are getting the value $$\prod_{i=1}^{980} \alpha_i$$ and not $$\prod_{i=1}^{961} \alpha_i$$.

I hope this clarifies my concern.

andreabosisio avatar Mar 04 '25 13:03 andreabosisio