AITemplate
AITemplate copied to clipboard
Support arbitrary width & height in stable diffusion example
It would be nice to be able to use different heights and widths up to 1024x1024.
If someone is interested in working on this, I will pay you a $10,000 prize for doing it. DM me on Twitter if so: https://twitter.com/suhail
Performance (throughput, memory) should not degrade more than 15%
Deadline: Nov 30, 2022 (whoever is first)
For anyone who is interested, here is how to do it for basic enablement:
- check dynamic codegen using a MIN strategy to create a default kernel instance
- maybe need to unfuse gemm + permute
If you know how to do it probably is 2 hours work.
For best performance:
Method 1: Need to learn how HINT profiling is working, and consider a better layout for gemm + permute fusion (more complex) Method 2: Do not fold weights during compiling, and compile multiple instances for different shape, and do bucketing. To avoid memory waste, modify codegen to pass blob memory from external (3-4 hours if you know what to do)
We will release some code to make static shape running 20% faster, to make current PyTorch pipeline running around 1 sec at batch 1. We can see it is able to run under 1 sec, maybe extra 10% - 20% even after the 20% speedup, but our job is more on Meta's internal workload, rather than optimizing diffusion models, so it may take a while.
FYI: v0.1.1 is released: https://github.com/facebookincubator/AITemplate/pull/74
New attention is more friendly to dynamic shape, and new runtime supports external memory allocators.