Alex McKinney
Alex McKinney
Hi @patrickvonplaten I somewhat misphrased my original question, I'm aware setting `requires_grad` to `False` prevents that particular parameter from accumulating gradients, essentially stopping the training of those parameters. But why...
Okay thank you @patrickvonplaten ! That explanation makes a lot of sense~
So, wrapping `Quantize.forward` in `@torch.cuda.amp.autocast(enabled=False)` and casting the buffers to be type `torch.float32`? Might also have to cast the input.
Okay! I can make a pull request for this if you want? If not, I can just close this.
For some reason I can't improve forward pass speed under FP16. (maybe it is bottlenecked by FP32 in quantize operations?) Memory usage is improved though. I'll play around with this...
@zesameri is that not for CogView rather than CogView2?
@seung-kim I was struggling with this too. I ran the script `scripts/download_first_stages.sh` which downloaded all the autoencoders, With each autoencoder there is a `config.yaml` file that says the training data...
Is the OpenAI public decoder (`https://cdn.openai.com/dall-e/decoder.pkl`) perhaps slightly different to the one used in this work? I am having the same issue where the reconstructed outputs from `BeitForMaskedImageModeling` are much...
[Imagen](https://arxiv.org/abs/2205.11487) (needs no introduction) proposes some interesting improvements as a so-called "Efficient U-Net". Might be worth checking out Appendix B.1 for a summary.
Sorry to necro an old thread (is necro a thing on Github :thinking:) but this seemed related and still opened. If you prefer I can open another issue. I've been...