tomesd Cannot reproduce the results of speedup

trafficstars

I implemented a script to test the acceleration by myself and tested it on 3090GPU, but it could not achieve the acceleration effect of Table 4 in the paper.

import torch, tomesd, random, time
import numpy as np
from diffusers import StableDiffusionPipeline, DDIMScheduler, PNDMScheduler 

def seed_everything(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)

seed = 2024
seed_everything(seed)
batch_size = 1
num_inference_steps = 50
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
# pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.scheduler = PNDMScheduler.from_config(pipe.scheduler.config)

prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator().manual_seed(seed)

# warmup gpu for testing time
print('Warm up of the gpu')
for i in range(2):
    image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps)
#-------------------


# origin pipeline
start_time = time.time()
image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps, generator=generator).images[0] 
end_time = time.time()
print("Origin Pipeline: {:.3f} seconds".format(end_time-start_time))
image.save("results/orgin.png")

# Apply ToMe with a 50% merging ratio
generator = torch.Generator().manual_seed(seed)
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here

start_time = time.time()
image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps, generator=generator).images[0] 
end_time = time.time()
print("ToMe: {:.3f} seconds".format(end_time-start_time))
image.save("results/orgin_ToMe.png")

The output from this script is:

Origin Pipeline: 2.728 seconds
ToMe: 2.581 seconds

In Table 4, the speedup is almost doubled when r=50. Why is this the case in my test? Could you give me some advice?

Mar 13 '24 12:03 649459021

You might not be giving your GPU enough work. Try increasing the batch size and image size and see what happens.
The results in the paper were done using the original stable diffusion repo, not diffusers. Diffusers itself has implemented a bunch of different optimizations that eat into the speed-up of ToMe, but you should still see gains at bigger image sizes.

Mar 13 '24 16:03 dbolya

Does this still work in 2024? with the xl or Lightning models?

Apr 23 '24 21:04 DiosMuerto

Tested on XL, and accidentally found that merging middle blocks (1024 dimensions) give much better results (although no big speedups)

Jul 12 '24 00:07 Ting-Justin-Jiang

tomesd tomesd copied to clipboard

Cannot reproduce the results of speedup

tomesd
tomesd copied to clipboard