stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

To make it faster to create

Open imgai-newbey opened this issue 2 years ago • 9 comments

To make it faster to create

torch.backends.cuda.matmul.allow_tf32 = True

addition

imgai-newbey avatar Jan 06 '23 05:01 imgai-newbey

some more

context

would be great

AUTOMATIC1111 avatar Jan 06 '23 05:01 AUTOMATIC1111

Adding torch.backends.cuda.matmul.allow_tf32 = True makes creation faster. When tested, the speed is slightly faster than xformers, and you can enjoy the miracle that the speed is about 2 to 3 times faster if you use them together because you can overlap them instead of using only one of them.

imgai-newbey avatar Jan 06 '23 06:01 imgai-newbey

You can think of the principle as replacing fp32 with tf32 to increase the calculation speed.

imgai-newbey avatar Jan 06 '23 06:01 imgai-newbey

Adding torch.backends.cuda.matmul.allow_tf32 = True makes creation faster. When tested, the speed is slightly faster than xformers, and you can enjoy the miracle that the speed is about 2 to 3 times faster if you use them together because you can overlap them instead of using only one of them.

Details? Benchmarks, environment, hardware configurations?

ice051128 avatar Jan 06 '23 10:01 ice051128

You can think of the principle as replacing fp32 with tf32 to increase the calculation speed.

Any impact on output? Can u reproduce old images with less numerical accuracy? https://dev-discuss.pytorch.org/t/pytorch-and-tensorfloat32/504

ice051128 avatar Jan 06 '23 10:01 ice051128

Considering we already do set this to true in https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/c9bded39ee05bd0507ccd27d2b674d86d6c0c8e8/modules/devices.py#L71 i would wager you would very much be able to reproduce old image with full accuracy

AUTOMATIC1111 avatar Jan 06 '23 10:01 AUTOMATIC1111

This only works on RTX 3000 and newer cards, but as AUTO said, it is already true now, so this PR might not be needed

aliencaocao avatar Jan 06 '23 10:01 aliencaocao

This only works on RTX 3000 and newer cards, but as AUTO said, it is already true now, so this PR might not be needed

Why does this PR exist if it is already enabled by default?

ice051128 avatar Jan 06 '23 11:01 ice051128

Oh you already added it. I should have looked at the file more. sorry..

imgai-newbey avatar Jan 06 '23 12:01 imgai-newbey