GenerativeModels
GenerativeModels copied to clipboard
NoiseSchedules cosine seems wrong and lead to division by 0
I wanted to use a DDPMScheduler with a cosine scheduling and obtained images filled with nan when sampling images.
I quickly inspected the code and found that it was caused by a division by 0 in the step function of the class DDPMScheduler right here :
pred_original_sample_coeff = (alpha_prod_t_prev ** (0.5) * self.betas[timestep]) / beta_prod_t
current_sample_coeff = self.alphas[timestep] ** (0.5) * beta_prod_t_prev / beta_prod_t
beta_prod_t being equal to 0 at step 0 when using cosine scheduler because it comes from :
alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep - 1] if timestep > 0 else self.one
beta_prod_t = 1 - alpha_prod_t
alphas_cumprod calculated like so in this case :
x = torch.linspace(0, num_train_timesteps, num_train_timesteps + 1)
alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()
Thus, alpha_cumprod[0] = 1 and beta_prod_t = 1 - 1 = 0
I saw no issue reporting this, maybe I am using it wrong. :man_shrugging:
I tried using DDPMScheduler(num_train_timesteps=1000, schedule="cosine") in the 2d_ddpm_compare_schedulers.ipynb and got nan filled images as result.
Hi there,
In the cosine schedule the alpha/beta are calculated with clipping, so beta_prod_t is not 0 when t=0 as far as i can see:
x = torch.linspace(0, num_train_timesteps, num_train_timesteps + 1)
alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()
alphas = torch.clip(alphas_cumprod[1:] / alphas_cumprod[:-1], 0.0001, 0.9999)
betas = 1.0 - alphas
return betas, alphas, alphas_cumprod[:-1]
however there are documented problems with the cosine scheduler, see discussion here
@sRassman reports better results if you try using leading timesteps here could you try that and see if it fixes it for you?
Hi thanks for your quick response,
Indeed the alphas are clipped in the code snippet you linked, but that's not those values which are used in the scheduler.
From what I saw it is the step function of DDPMScheduler which does that.
In the beginning of the step function, some variables are defined:
alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep - 1] if timestep > 0 else self.one
beta_prod_t = 1 - alpha_prod_t
beta_prod_t_prev = 1 - alpha_prod_t_prev
The issue comes from the alpha_prod_t variable which is equal to 1 at timestep 0 because with cosine scheduler enable alphas_cumprod is defined like so :
alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()
So alpha_prod_t at time step 0 is always equal to 1 and so beta_prod_t = 1 - alpha_prod_t is equal to 0 and later in that step function values are divided by this same beta_prod_t (equal to 0 and thus leading to NaN results).
I will try what @sRassman proposed and give you a feedback later :+1:
ah yes nice spot - it seems like we should be making sure alpha cumprod is calculated from the clipped alphas before we return it from the cosine scheduler
Hi
I came across the same issue of receiving Nans with Cosine due to devision by zero. I looked to see if there is an open issue about it and here it is. Are there immediate plans to fix this? How do you suggest to handle it right now?
Thanks oded
Dear Oded,
Note that the MONAI Generative Models repository will be soon archived because the code has been integrated in MONAI core (https://github.com/Project-MONAI). Could you check if using the latest version of the schedulers from MONAI core leads to the same error?
If so, we will look at it immediately. Otherwise, please use that alternative repository.
Thank you very much!
Virginia
Hi Virginia
If I understand correctly the code here: https://github.com/Project-MONAI/GenerativeModels/blob/main/generative/networks/schedulers/scheduler.py
has been replaced with: https://github.com/Project-MONAI/MONAI/blob/dev/monai/networks/schedulers/scheduler.py
The cosine function is exactly the same so I don't expect any difference. I rewrote the code to work and I urge you to fix this issue. It may give much value.
Thanks Oded
On Mon, Sep 23, 2024 at 11:05 AM Virginia Fernandez < @.***> wrote:
Dear Oded,
Note that the MONAI Generative Models repository will be soon archived because the code has been integrated in MONAI core ( https://github.com/Project-MONAI). Could you check if using the latest version of the schedulers from MONAI core leads to the same error?
If so, we will look at it immediately. Otherwise, please use that alternative repository.
Thank you very much!
Virginia
— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/GenerativeModels/issues/489#issuecomment-2367493105, or unsubscribe https://github.com/notifications/unsubscribe-auth/APGGBDEX64D2DRGWMW6N7NDZX7DTTAVCNFSM6AAAAABHYBKQBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRXGQ4TGMJQGU . You are receiving this because you commented.Message ID: @.***>
Dear Oded
Thanks. We will look into it. Could you please open an issue in MONAI core describing the problem so that we can have a look at the problem from there and trace it?
Thanks!
Virginia