consistencydecoder
consistencydecoder copied to clipboard
Significant performance problems, see profiler screenshot
Running the consistency decoder takes several seconds and most of this time is spent in a stalled state and reducing the number of diffusion steps leads to no meaningful speed increase. The default SD1.5 decoder is ~100x faster running the code example in the readme.
I'm on Pytorch 2.0.1 on Linux kernel 6.1 with an RTX 3060