richcmwang
richcmwang
The loss goes up with the default parameters/set up very early in the training even before the discriminator kicked in. I am puzzled by this.
For people who successfully train VQGAN, do you experience increasing quantized loss over time?
generating seems to be tricky because it seems the deepspeed or DataParallel etc only work through an `nn.Module` (`forward`). But the following code works for me to balance the gpus...
It does not improve the speed of one batch per GPU, but with 2 (or multiple) GPUs, it does improve the speed. In my test case, the running time ratio...
@afiaka87 Please feel free to incorporate this. I tried [inference](https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/inference-tutorial.md) but either get incorrect key "checkpoint_path" or unknown type "DeepSpeed" error message. Not sure the doc is accurate. ``` "checkpoint.json":...
@afiaka87 Thanks for all the information! The in-depth video discussion is really interesting. I was training `VQGAN` and find it difficult to training compared to `DiscreteVAE` and start looking into...
@afiaka87 @bob80333 Thanks for pointing this out. Yes, I probably need to train a very long time from the scratch before seeing reasonable reconstruction. I load the pretrained checkpoint and...
`sparse_attn` does not seem to have any effect on the `Dalle` or `Transformer`. Also interested in figuring out the difference between `full` (`Attention`) and `sparse` (`SparseAttention`)? Are they different implementations...