Sansa Gong

Results 54 comments of Sansa Gong
trafficstars

PyTorch's multi-threaded library is not working properly under Windows. So you need to add ``` if __name__ == '__main__': ``` to the main code. However, when I modify this, I...

Yes! We successfully integrate dmp-solver into our model in our project [DoT](https://github.com/HKUNLP/diffusion-of-thoughts). Thanks a lot!

Yes. For detailed information, please refer to the original paper.

@BIT-MJY Hi, YES, using v2 to speed-up, one or two steps also works. For a simple task, we can almost maintain the performance, while for more complex tasks there might...

Hi, the `q_sample()` can work for `t=0`, which returns `x_start`. The model learns the mse loss between $x_0$ and $Emb(w^x)$ here.

Hi, Sorry for the ambiguity, let me elucidate. For canonical noise scheduler, $\beta_0 \rightarrow 0$, so it returns $x_0$. However, we use the sqrt noise scheduler, where $\beta_0=0.121$ when $T=2000$....

It is suggested to use gradient monitoring and logging during training to identify the layer(s) or operation(s) causing the problem.

Hi, According to our experience, the sufficient training could avoid this situation. Another choice is to omit the computation of the token's loss in the training code. Both of them...

@swave-demo Hi, this is a good point. Let me explain this. The input mask takes two roles: a. keep x input part un-noised; b. mask out the mse loss of...

Because the model `ema_0.9999_010000.pt.samples` hasn't converged yet. Try `ema_0.9999_050000.pt.samples`.