improved-diffusion icon indicating copy to clipboard operation
improved-diffusion copied to clipboard

How to understand the loss(loss_q0,loss_q1,loss_q2)?

Open shenxiaochenn opened this issue 2 years ago • 6 comments


| grad_norm | 0.0913 | | loss | 0.0621 | | loss_q0 | 0.17 | | loss_q1 | 0.0455 | | loss_q2 | 0.0209 | | loss_q3 | 0.00557 | | mse | 0.0583 | | mse_q0 | 0.156 | | mse_q1 | 0.0453 | | mse_q2 | 0.0208 | | mse_q3 | 0.00549 | | samples | 2.85e+06 | | step | 1.11e+04 | | vb | 0.00376 | | vb_q0 | 0.0143 | | vb_q1 | 0.000231 | | vb_q2 | 0.000103 | | vb_q3 | 7.26e-05 |

Hi, I got some output with your code, but I can't figure out what q_0,q_1,q_2,q_3 means here. Thanks~~

shenxiaochenn avatar Jan 28 '23 07:01 shenxiaochenn

hi bros, have you understood these? I am confused, too.

chensming avatar Feb 25 '23 03:02 chensming

hi bros, have you understood these? I am confused, too.

I must say. I don`t know

shenxiaochenn avatar Feb 25 '23 03:02 shenxiaochenn

现在知道了吗大胸弟

aobusi avatar Mar 14 '23 21:03 aobusi

    for sub_t, sub_loss in zip(ts.cpu().numpy(), values.detach().cpu().numpy()):
        quartile = int(4 * sub_t / diffusion.num_timesteps)
        logger.logkv_mean(f"{key}_q{quartile}", sub_loss)
     就是反向推理恢复原图时从0到T步,中间抽了几次计算损失

theneao avatar Mar 16 '23 14:03 theneao

谢谢

aobusi avatar Mar 17 '23 09:03 aobusi

I figured it out after a while: so, in diffusion, the loss is calculated via the sum of many loss sub-terms (see the variational lower bound formulation in the relevant paper), each term corresponding to one denoising (diffusion) step of the diffusion process. Denote the loss term corresponding to the $i^{\rm th}$ diffusion step by $L_i$. Further, say that the specific diffusion instance is done in $n_{\rm timesteps}$ steps and that the logger logs every $n_{\rm training}$ training steps. Then, denoting via $L_{i,j}$ the $i^{\rm th}$ loss term corresponding to $j^{\rm th}$ training step between logging intervals, the loss_qi terms reported to the user are calculated as follows,

  • $0\le i/n_{\rm timesteps} < 0.25$: $L_{i,j}$ is assigned to a set corresponding to loss_q0
  • $0.25\le i/n_{\rm timesteps} < 0.5$: $L_{i,j}$ is assigned to a set corresponding to loss_q1
  • $0.5\le i/n_{\rm timesteps} < 0.75$: $L_{i,j}$ is assigned to a set corresponding to loss_q2
  • $0.75\le i/n_{\rm timesteps} < 1.0$: $L_{i,j}$ is assigned to a set corresponding to loss_q3

Finally, the loss_qi term we see in the log file is the average over all $L_{i,j}$ in its respective set.

In other words, each loss_qi is a measure of how well the $i^{\rm th}$ quartile of the diffusion process (i.e. the $i^{\rm th}$ quartile of denoising steps) is performing.

Stamatis8 avatar May 28 '24 23:05 Stamatis8