Sansa Gong comments

Results 54 comments of


                                            Sansa Gong

trafficstars

cannot run demo.py, what does it mean?

PyTorch's multi-threaded library is not working properly under Windows. So you need to add ``` if __name__ == '__main__': ``` to the main code. However, when I modify this, I...

Can DPM-Solver support self-conditioning?

Yes! We successfully integrate dmp-solver into our model in our project [DoT](https://github.com/HKUNLP/diffusion-of-thoughts). Thanks a lot!

DDPM

Yes. For detailed information, please refer to the original paper.

DDPM

@BIT-MJY Hi, YES, using v2 to speed-up, one or two steps also works. For a simple task, we can almost maintain the performance, while for more complex tasks there might...

About loss in training_losses_seq2seq() when time step t=0

Hi, the `q_sample()` can work for `t=0`, which returns `x_start`. The model learns the mse loss between $x_0$ and $Emb(w^x)$ here.

About loss in training_losses_seq2seq() when time step t=0

Hi, Sorry for the ambiguity, let me elucidate. For canonical noise scheduler, $\beta_0 \rightarrow 0$, so it returns $x_0$. However, we use the sqrt noise scheduler, where $\beta_0=0.121$ when $T=2000$....

'grad_norm' is NaN

It is suggested to use gradient monitoring and logging during training to identify the layer(s) or operation(s) causing the problem.

Taken <Pad> as a regular token could make model only learn the <Pad> information?

Hi, According to our experience, the sufficient training could avoid this situation. Another choice is to omit the computation of the token's loss in the training code. Both of them...

Taken <Pad> as a regular token could make model only learn the <Pad> information?

@swave-demo Hi, this is a good point. Let me explain this. The input mask takes two roles: a. keep x input part un-noised; b. mask out the mse loss of...

Issues with decoding and evaluation

Because the model `ema_0.9999_010000.pt.samples` hasn't converged yet. Try `ema_0.9999_050000.pt.samples`.