stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Feature Request]: Add Facebook's Token Merging feature for faster inference time
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
Implement https://github.com/facebookresearch/ToMe which allows for faster image inference time
Proposed workflow
Maybe a CLI option? From what I read it decreases accuracy by a bit, so some people won't want to have it enabled.
Additional information
See code in https://github.com/facebookresearch/ToMe/pull/7 and https://github.com/Birch-san/stable-diffusion/
From the results by others and me it seems to speedup inference by ~20-25% at 512x512 which is quite significant, and allows for generation of much bigger images.
I did an extremely quick and dirty patch of webui and stable-diffusion repo with code from https://github.com/Birch-san/stable-diffusion to check ToMe. I also hard-coded doggettx's attention into the code because I'm not good enough to figure out how to add this properly - so for anyone using xformers, xformers will be faster for you than this patch.
https://gist.github.com/Yardanico/081e7e23ea1d51dd70f1a75a6df8b876 if you want to try.
I'm getting 25% speed increase on my RX6700XT from 6it/s to 7.5it/s, and I can also generate bigger resolutions while it being faster.
There is some accuracy loss though, but it largely depends on the prompt.
very interesting work, hope we can enjoy this feature soon (with xformers if possible).
And note that ToMe is drafting a stable diffusion suport (with examples and code "coming soon"), also ref: https://github.com/facebookresearch/ToMe/issues/4
Update: there is already a ToMe implementation for Stable Diffusion: https://github.com/dbolya/tomesd It seems that it's intuitive to support it since there is only one line of code to apply it to a model:
import tomesd
# Patch a Stable Diffusion model with ToMe for SD using a 50% merging ratio.
# Using the default options are recommended for the highest quality, tune ratio to suit your needs.
tomesd.apply_patch(model, ratio=0.5)
# However, if you want to tinker around with the settings, we expose several options.
# See docstring and paper for details. Note: you can patch the same model multiple times.
tomesd.apply_patch(model, ratio=0.9, sx=4, sy=4, max_downsample=2) # Extreme merging, expect diminishing returns
Update: There is already a PR working on this, see below
Update again: I implement a extension to use ToMe (https://github.com/SLAPaper/a1111-sd-webui-tome), but it seems only gives a ~13% speed up when using batch size 8
Update: there is already a ToMe implementation for Stable Diffusion: https://github.com/dbolya/tomesd It seems that it's intuitive to support it since there is only one line of code to apply it to a model:
import tomesd # Patch a Stable Diffusion model with ToMe for SD using a 50% merging ratio. # Using the default options are recommended for the highest quality, tune ratio to suit your needs. tomesd.apply_patch(model, ratio=0.5) # However, if you want to tinker around with the settings, we expose several options. # See docstring and paper for details. Note: you can patch the same model multiple times. tomesd.apply_patch(model, ratio=0.9, sx=4, sy=4, max_downsample=2) # Extreme merging, expect diminishing returnsI'll try myself to find a proper place to add the line of code and edit comment if making some progress
Working on this in AUTOMATIC1111/stable-diffusion-webui#9256