blog Typo SDXL Optimization

https://huggingface.co/blog/simple_sdxl_optimizations

Is there a typo on the memory usage for when the text embeddings are precomputed? When compared with the default (fp16 + SDPA) it uses slightly more memory and slightly more time, which I feel should be the opposite no?

Oct 27 '23 13:10 tiagosalvador

cc @sayakpaul

Oct 27 '23 15:10 pcuenca

There's not. It's surprising a bit but all of these are quite dependent on the amount of compute density a given GPU can occupy. This is what I hinted in the blog post too.

Oct 27 '23 15:10 sayakpaul

Thanks! I guess that closes the issue, although I would appreciate if you could you elaborate a little more on that or point to some resource? I can't wrap my mind around the fact that the tokenizer and text encoder are no longer in memory and yet we are using more memory.

Oct 27 '23 21:10 tiagosalvador

https://arxiv.org/abs/2110.12894 might be a good read in this regard.

So, to further clarify this, one could try to measure the memory at various batch sizes and see how the results are getting affected. If you see anything interesting, don't hesitate to let us know.

Oct 28 '23 04:10 sayakpaul

I prepared a separate Colab Notebook here as well for easier investigation: https://colab.research.google.com/gist/sayakpaul/b1405b9f2643604d0483f4124c4e56c0/scratchpad.ipynb

With model CPU offloading thrown on top of pre-computing, we have:

Execution time -- 14538.2 ms

Max memory allocated: 16.847426891326904 GB

Surprises of compute density.

Oct 28 '23 12:10 sayakpaul