tomesd icon indicating copy to clipboard operation
tomesd copied to clipboard

Support for Imagen

Open lsabrinax opened this issue 2 years ago • 4 comments

Thanks for your nice work! I want to know whether tomesd can only support stable diffusion model, can it support other diffusion model like as imagen

lsabrinax avatar Jul 04 '23 09:07 lsabrinax

I believe Imagen uses just convnets for its unet, not a transformer like stable diffusion does. So in that respect, it can't be used like I use it for stable diffusion here. However, if the underlying network has self attention modules or uses a transformer in some way, then it's possible to use it. Unsure how (or if) that would apply to Imagen, though.

dbolya avatar Jul 04 '23 09:07 dbolya

Thanks for your reply, I'll try it on Imagen later. And I try it on stable-diffusion first and run it on A30 GPU, when I set ratio=0.5, the time cost was 1.4->0.939(1.5x), and gpu memory was 17648MB->15576MB, the improvement is not as good as reported in Readme, and when I set ratio=0.6, the cost time and GPU memory are greater than ratio=0.5. It could be what reasons? How can I reproduce the result

lsabrinax avatar Jul 04 '23 11:07 lsabrinax

and when I set ratio=0.6, the cost time and GPU memory are greater than ratio=0.5

That doesn't seem right. What environment are you in and how are you benchmarking this?

dbolya avatar Jul 04 '23 20:07 dbolya

I rerun the following code on V100 GPU to evaluate the performance, torcch version is 0.12.1 ,image size is 512* 512

import torch, tomesd
from diffusers import StableDiffusionPipeline
import time

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")

# Apply ToMe with a 50% merging ratio
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here
infer_time, count = 0.0, 0.0
for i in range(200):
    start = time.time()
    image = pipe("a photo of an astronaut riding a horse on mars").images[0]
    infer_time += time.time() - start
    count += 1
image.save("astronaut.png")
print(f'average time: {infer_time / count}')

w/o tomesd: gpu memory is 6040MB and average time is 4.055s; w/ tomesd and ratio=0.5, the gpu memory is 5216MB and average time is 3.5749s, it is not speed up obviously as reported in table in Readme

lsabrinax avatar Jul 05 '23 09:07 lsabrinax