To make it faster to create
To make it faster to create
torch.backends.cuda.matmul.allow_tf32 = True
addition
some more
context
would be great
Adding torch.backends.cuda.matmul.allow_tf32 = True makes creation faster. When tested, the speed is slightly faster than xformers, and you can enjoy the miracle that the speed is about 2 to 3 times faster if you use them together because you can overlap them instead of using only one of them.
You can think of the principle as replacing fp32 with tf32 to increase the calculation speed.
Adding torch.backends.cuda.matmul.allow_tf32 = True makes creation faster. When tested, the speed is slightly faster than xformers, and you can enjoy the miracle that the speed is about 2 to 3 times faster if you use them together because you can overlap them instead of using only one of them.
Details? Benchmarks, environment, hardware configurations?
You can think of the principle as replacing fp32 with tf32 to increase the calculation speed.
Any impact on output? Can u reproduce old images with less numerical accuracy? https://dev-discuss.pytorch.org/t/pytorch-and-tensorfloat32/504
Considering we already do set this to true in https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/c9bded39ee05bd0507ccd27d2b674d86d6c0c8e8/modules/devices.py#L71 i would wager you would very much be able to reproduce old image with full accuracy
This only works on RTX 3000 and newer cards, but as AUTO said, it is already true now, so this PR might not be needed
This only works on RTX 3000 and newer cards, but as AUTO said, it is already true now, so this PR might not be needed
Why does this PR exist if it is already enabled by default?
Oh you already added it. I should have looked at the file more. sorry..