Sansa Gong comments

Results 54 comments of


                                            Sansa Gong

trafficstars

Problem about Running Time on Dialogue dataset

Hi, the time you estimate is close to ours, with 4 80G A100 GPUs. Using FP16 could save training time (we didn't implement this in the current version of code).

Question about tT_loss

Hi, Yes, `tT_loss` does not pass the transformer layer, but there are still learnable params, i.e. the params of word embedding (from `x_start`). We can regard it as a kind...

Error when decoding

Maybe you can try to decode using single GPU.

sampling issue

Hi, I think the model is not well-trained so it can not recover meaningful tokens. Maybe you could try other hyper-params. Another concern is that the size of your dataset...

sampling issue

Actually it's not easy, because training and inference stages are not strictly symmetrical. You can try to recover 50% noised data instead of pure Gaussian noise.

why MSE of "x_start" and output not "noise" and output?

Hi, In diffusion process, recovering the noise $\epsilon$ or $x_0$ or $x_{t-1}$ all could work, as long as the process is symmetric between training and sampling. Previous works show that...

use additional model to guide text generation

DiffuSeq focuses on the conditional generation (generate `y` given `x`) while Diffusion-LM focuses on the generation with constraints (generate sentence `s` given attribute `a`). Using additional model is orthogonal to...

Why no attention mask used?

Currently, the pad is treated as a regular token, and the generated length could change in the generation process. It can avoid the need for an additional length prediction module,...

Resume checkpoint does not include loading embedding?

Hi, We didn't use the saved embedding. The word embedding params are built into the model, so the resume operation could load it.

rounding issue

Rounding operation maps the word embedding vectors back to discrete tokens. and we still map these tokens into vectors as the input of next-step generation. This operation makes sure that...