blog icon indicating copy to clipboard operation
blog copied to clipboard

Typo SDXL Optimization

Open tiagosalvador opened this issue 2 years ago • 5 comments

https://huggingface.co/blog/simple_sdxl_optimizations

Is there a typo on the memory usage for when the text embeddings are precomputed? When compared with the default (fp16 + SDPA) it uses slightly more memory and slightly more time, which I feel should be the opposite no?

tiagosalvador avatar Oct 27 '23 13:10 tiagosalvador

cc @sayakpaul

pcuenca avatar Oct 27 '23 15:10 pcuenca

There's not. It's surprising a bit but all of these are quite dependent on the amount of compute density a given GPU can occupy. This is what I hinted in the blog post too.

sayakpaul avatar Oct 27 '23 15:10 sayakpaul

Thanks! I guess that closes the issue, although I would appreciate if you could you elaborate a little more on that or point to some resource? I can't wrap my mind around the fact that the tokenizer and text encoder are no longer in memory and yet we are using more memory.

tiagosalvador avatar Oct 27 '23 21:10 tiagosalvador

https://arxiv.org/abs/2110.12894 might be a good read in this regard.

So, to further clarify this, one could try to measure the memory at various batch sizes and see how the results are getting affected. If you see anything interesting, don't hesitate to let us know.

sayakpaul avatar Oct 28 '23 04:10 sayakpaul

I prepared a separate Colab Notebook here as well for easier investigation: https://colab.research.google.com/gist/sayakpaul/b1405b9f2643604d0483f4124c4e56c0/scratchpad.ipynb

With model CPU offloading thrown on top of pre-computing, we have:

Execution time -- 14538.2 ms

Max memory allocated: 16.847426891326904 GB

Surprises of compute density.

sayakpaul avatar Oct 28 '23 12:10 sayakpaul