improved-diffusion
improved-diffusion copied to clipboard
How to understand the loss(loss_q0,loss_q1,loss_q2)?
| grad_norm | 0.0913 | | loss | 0.0621 | | loss_q0 | 0.17 | | loss_q1 | 0.0455 | | loss_q2 | 0.0209 | | loss_q3 | 0.00557 | | mse | 0.0583 | | mse_q0 | 0.156 | | mse_q1 | 0.0453 | | mse_q2 | 0.0208 | | mse_q3 | 0.00549 | | samples | 2.85e+06 | | step | 1.11e+04 | | vb | 0.00376 | | vb_q0 | 0.0143 | | vb_q1 | 0.000231 | | vb_q2 | 0.000103 | | vb_q3 | 7.26e-05 |
Hi, I got some output with your code, but I can't figure out what q_0,q_1,q_2,q_3 means here. Thanks~~
hi bros, have you understood these? I am confused, too.
hi bros, have you understood these? I am confused, too.
I must say. I don`t know
现在知道了吗大胸弟
for sub_t, sub_loss in zip(ts.cpu().numpy(), values.detach().cpu().numpy()):
quartile = int(4 * sub_t / diffusion.num_timesteps)
logger.logkv_mean(f"{key}_q{quartile}", sub_loss)
就是反向推理恢复原图时从0到T步,中间抽了几次计算损失
谢谢
I figured it out after a while: so, in diffusion, the loss is calculated via the sum of many loss sub-terms (see the variational lower bound formulation in the relevant paper), each term corresponding to one denoising (diffusion) step of the diffusion process. Denote the loss term corresponding to the $i^{\rm th}$ diffusion step by $L_i$. Further, say that the specific diffusion instance is done in $n_{\rm timesteps}$ steps and that the logger logs every $n_{\rm training}$ training steps. Then, denoting via $L_{i,j}$ the $i^{\rm th}$ loss term corresponding to $j^{\rm th}$ training step between logging intervals, the loss_qi
terms reported to the user are calculated as follows,
- $0\le i/n_{\rm timesteps} < 0.25$: $L_{i,j}$ is assigned to a set corresponding to
loss_q0
- $0.25\le i/n_{\rm timesteps} < 0.5$: $L_{i,j}$ is assigned to a set corresponding to
loss_q1
- $0.5\le i/n_{\rm timesteps} < 0.75$: $L_{i,j}$ is assigned to a set corresponding to
loss_q2
- $0.75\le i/n_{\rm timesteps} < 1.0$: $L_{i,j}$ is assigned to a set corresponding to
loss_q3
Finally, the loss_qi
term we see in the log file is the average over all $L_{i,j}$ in its respective set.
In other words, each loss_qi
is a measure of how well the $i^{\rm th}$ quartile of the diffusion process (i.e. the $i^{\rm th}$ quartile of denoising steps) is performing.