Sansa Gong
Sansa Gong
If you don't have GPUs, you can try [colab](https://colab.research.google.com/).
@mainpyp I'm using python 3.9
Hi, We dind't use top-p sampling in our experiment. During sampling, we compute the logits of each token, and you can do top-p sampling or beam search based on this....
Hi, Maybe you can try to add `keep_in_memory = True` in function `raw_datasets.map` https://github.com/Shark-NLP/DiffuSeq/blob/bea43e1fd0a954486bc36ad62f2a71dcb2bd300a/diffuseq/text_datasets.py#L78 If it doesn't work, you can try to split your datasets into separate folds and load...
Hi, we randomly initialized the embeddings and trained them end-to-end, so it is the same setting with Diffusion-LM. We also tried the second setting, which is compared with joint E2E...
Hi, I didn't encounter such a situation before. Which datasets did you use? It seems that $x_t$ with a larger sampled $t$ didn't get sufficient training.
> Do you mean with $x_t$ with a larger sampled $t$ didn't get sufficient training_ that the last 25% of the denoising steps are not yet trained properly however the...
Respond to 1: Yes Respond to 2: Yes, the specific number of q_n is computed after re-weighting.
Hi, Plz refer to #25
it is a metric to trace the nll of reconstruct $x_0$, and through the training nll / eval nll, we can tell whether the training process is normal or not.