Typo SDXL Optimization
https://huggingface.co/blog/simple_sdxl_optimizations
Is there a typo on the memory usage for when the text embeddings are precomputed? When compared with the default (fp16 + SDPA) it uses slightly more memory and slightly more time, which I feel should be the opposite no?
cc @sayakpaul
There's not. It's surprising a bit but all of these are quite dependent on the amount of compute density a given GPU can occupy. This is what I hinted in the blog post too.
Thanks! I guess that closes the issue, although I would appreciate if you could you elaborate a little more on that or point to some resource? I can't wrap my mind around the fact that the tokenizer and text encoder are no longer in memory and yet we are using more memory.
https://arxiv.org/abs/2110.12894 might be a good read in this regard.
So, to further clarify this, one could try to measure the memory at various batch sizes and see how the results are getting affected. If you see anything interesting, don't hesitate to let us know.
I prepared a separate Colab Notebook here as well for easier investigation: https://colab.research.google.com/gist/sayakpaul/b1405b9f2643604d0483f4124c4e56c0/scratchpad.ipynb
With model CPU offloading thrown on top of pre-computing, we have:
Execution time -- 14538.2 ms
Max memory allocated: 16.847426891326904 GB
Surprises of compute density.