stablediffusion
stablediffusion copied to clipboard
* Use Cutlass ops when possible to +15% speed
Use Cutlass ops when possible to +15% speed (for free) Sampling speed 1.80it/s->2.08it/s on RTX5000.
Won't trigger if MemoryEfficientAttentionCutlassOp is not available.
@danthe3rd