Bing Xu
Bing Xu
This change will break AMD MI250 benchmark. A better way is to re-enable hint based dynamic shape support in this version at compiling time, then at benchmark time we can...
We will make it this week after making the v0.1 tag. On Mon, Oct 10, 2022 at 20:33 Mark Saroufim ***@***.***> wrote: > See title, would be nice to pip...
Thanks @illsilin! Please sign the CLA. Also could you sign off on the performance & correctness on all examples?
Sounds like possible to be torch cache allocator issue. Will investigate today. Thanks for letting us know.
https://github.com/facebookincubator/AITemplate/pull/43
According to https://github.com/facebookincubator/AITemplate/pull/43 it looks like CUDA graph caused CPU memory leaking. While we are debugging whether it is caused by AIT side or CUDA Graph side, we can disable...
@mikeiovine fixed the bug. Will do a sync today to fix this issue.
For 512x512, AIT is running at 42it/s vs 27it/s in this PR. So I don't think we need to support xformer. Loop @terrychenism for 1024x1024 generation.
If it is out of memory we may consider to make UNet batch size to 1 and run twice in each step, this will save a lot of memory.
It is not open sourced in this release. There is a FX2AIT project probably will be open sourced in the future. On Tue, Oct 4, 2022 at 05:49 Lee kwang...