DiffusionFromScratch
DiffusionFromScratch copied to clipboard
Normalize the pred of the network but not for the ground truth of score. Why?
Hi! Animadversio
Firstly, I'd like to commend you on your work; it's truly impressive! I've been closely examining the sample_X_and_score experiment in the ReverseSDE_Diffusion.ipynb notebook and have a few clarifications to seek.
In your approach to approximate the analytical score with a Neural Network, I noticed that while constructing the data-ground-truth pair, there doesn't seem to be any normalization applied to the ground truth of the score. However, in the code snippet, there appears to be a form of normalization in the fourth line. This has left me a bit puzzled. Specifically, while y_pred is normalized, y_train (which serves as the ground truth) isn't normalized by the timestep. This gives an impression that they might be on different scales.
-------- code ---------- for ep in pbar: y_pred = score_model_analy(X_train, T_train) loss = torch.mean(torch.sum((y_pred - y_train)**2 * std_vec[:, None], dim=(1))) # <======= here -------- code ---------- Moreover, in the forward method, the output is normalized:
-------- code ---------- def forward(self, x, t): t_embed = self.embed(t) pred = self.net(torch.cat((x,t_embed),dim=1)) pred = pred / self.marginal_prob_std_f(t)[:, None,]. # <======= here return pred -------- code ----------
Given this, I'm curious about how arithmetic operations can be applied on y_pred and y_train if they're potentially on different scales.
I've also taken a look at the Colab notebook you provided (link) and attached an image for reference:
I'd greatly appreciate your insights on this matter. Thank you for your time and patience.
Warm regards,
in this example(https://colab.research.google.com/drive/1_MEFfBdOI06GAuANrs1b8L-BBLn3x-ZJ?usp=sharing#scrollTo=PtAwqb0QQUUn ) , "Defining the loss function" section :
z = torch.randn_like(x) # get normally distributed noise
perturbed_x = x + z * std[:, None, None, None]
score = model(perturbed_x, random_t)
loss = torch.mean(torch.sum((score * std[:, None, None, None] + z)**2, dim=(1,2,3)))
it seems the 'score' is normalized (the last line in forward function ) while 'z' is just a normal distribution tensor between (0,1) I think loss should be
loss = torch.mean(torch.sum((score + z)**2, dim=(1,2,3)))
or
loss = torch.mean(torch.sum((score * std[:, None, None, None] + z * std[:, None, None, None])**2, dim=(1,2,3)))