DDIM sampling implementation considerations
I'm not sure whether this is correct, as—when ddim_S > 1—the previous timestep in case of DDIM (t_prev = t - ddim_S) is not the same as the one of DDPM (t_prev = t - 1).
https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L344
Also, I'd like to understand why the UNet model receives as input noise_level = self.sqrt_alphas_cumprod_prev[ts] instead of ts itself here https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L346
Thank you in advance for your collaboration
Q1
I'm sorry, but I didn't quite understand your question. Could you please provide more details?
Q2
The variable ts is typically processed using something like positional embeddings since it is an integer and needs to be converted into a float.
Some approaches use nn.Embedding, while others utilize sinusoidal positional embeddings.
Using sqrt_alphas_cumprod_prev follows a similar principle.
Q1
I'm sorry, but I didn't quite understand your question. Could you please provide more details?
self.sqrt_alphas_cumprod_prev is indexed for DDPM (i.e., t - t_prev = 1), meaning that self.sqrt_alphas_cumprod_prev[t] gives the the value at tilmestep t-1, whereas for DDIM one would need to have the value at tilmestep t-ddim_S.
https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L341-L344
Based on sample_ddim function, the ts is sampled iteratively with a specific step size, where the step size would be larger than 1 when ddim_S > 1.
Yes, I'm aware of that, but the issue I'm pointing out is this: imagine the current timestep is step=500, and according to the DDIM schedule, the previous timestep is t_prev=450. If I'm not mistaken, self.sqrt_alphas_cumprod_prev[step] would actually give the value at timestep 499, while the value we need is the one at timestep t_prev=450. This happens because self.sqrt_alphas_cumprod_prev is shifted by one timestep, as defined here https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L136
where alphas_cumprod is computed on the betas array, which is in turn defined as a DDPM schedule with step 1, as schedule_opt["n_timestep"] in https://github.com/Kitsunetic/SDF-Diffusion/blob/2bc5fe0efe55ad8ebfee5b59cb71688ff7850893/src/models/diffusion.py#L125-L127 contains the total number of DDPM timesteps.
That’s interesting point.
I think synchronizing sqrt_alphas_cumprod_prev when step=0 is important.
For example, when DDPM step size is 1000 and DDIM step size is 20:
| DDPM step | prev | DDIM step (current implementation) | prev | DDIM step (your implementation) | prev |
|---|---|---|---|---|---|
| 999 | 998 | 950 | 949 | 950 | 900 |
| 998 | 997 | 900 | 899 | 900 | 850 |
| … | … | … | |||
| 1 | 0 | 50 | 49 | 50 | 0 |
| 0 | -1 | 0 | -1 | 0 | -50 |
An easy way to handle the step=0 case is as they do in here https://github.com/huggingface/diffusers/blob/560fb5f4d65b8593c13e4be50a59b1fd9c2d9992/src/diffusers/schedulers/scheduling_ddim.py#L406.
Anyway, I think the issue is there also for all the other steps, right?
I've just run it with n_timesteps=1000 and ddim_S=50, and at the first iteration, step=981 and the previous timestep (i.e., the next value in the time_range array) is 961, as expected. However, by how the self.sqrt_alphas_cumprod_prev array is defined, with self.sqrt_alphas_cumprod_prev[step] we are getting the value $$\prod_{i=1}^{980} \alpha_i$$ and not $$\prod_{i=1}^{961} \alpha_i$$.
I hope this clarifies my concern.