diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Very Slow first inference with diffusers 0.27.X

Open nesscube opened this issue 10 months ago • 5 comments

Describe the bug

Hello diffusers team ! I face an annoying issue since I upgraded the diffusers version to 0.27.X The first call (and only the first) of pipeline(...) takes now a lot of time before to start inference (like a minute) Moreover the call of compel(prompts) takes 30 seconds versus instant in 0.26.X

Thos slow down seems to happen only :

  • On 0.27.X version of diffusers
  • for XL models
  • if I load pipeline with from_single_file with a safetensors file
  • I run my inference in a Docker Container My dockerfile starts with : FROM python:3.10.6-slim-buster

Unfortunately I need all of these for my project ..

thanks a lot for help !

Reproduction

from compel import Compel from diffusers import ( StableDiffusionXLPipeline )

pipeline = StableDiffusionXLPipeline.from_single_file( model_path, torch_dtype=torch.float16, local_files_only=True, use_safetensors=True, add_watermarker=False, original_config_file=model_config, vae=AutoencoderKL.from_pretrained(model_path_vae, torch_dtype=torch.float16) ) pipeline.enable_model_cpu_offload()

prompt_embeds, pooled_prompt_embeds = compel(prompts) negative_prompt_embeds, negative_pooled_prompt_embeds = compel(negative_prompts)

result = pipeline( prompt_embeds=prompt_embeds, pooled_prompt_embeds=pooled_prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, negative_pooled_prompt_embeds=negative_pooled_prompt_embeds , width=width, height=height, num_inference_steps=num_inference_steps, guidance_scale=6, num_images_per_prompt=1, generator=torch.Generator(device='cuda').manual_seed(seed) )

Logs

No response

System Info

  • diffusers version: 0.27.2
  • Platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.36
  • Python version: 3.10.14
  • PyTorch version (GPU?): 2.1.2+cu121 (True)
  • Huggingface_hub version: 0.22.2
  • Transformers version: 4.36.2
  • Accelerate version: 0.26.1
  • xFormers version: 0.0.23.post1
  • Using GPU in script?: no
  • Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @sayakpaul @DN6

nesscube avatar Apr 26 '24 12:04 nesscube

Could you provide a reproducible snippet without Compel that demonstrates the inference slow down?

sayakpaul avatar Apr 26 '24 13:04 sayakpaul

Also, FWIW, we run benchmarking tests regularly and do automated reporting: https://huggingface.co/datasets/diffusers/benchmarks/tree/main. As we can see, there's no weird latency changes in the most commonly used pipelines.

sayakpaul avatar Apr 26 '24 13:04 sayakpaul

Hello

@nesscube You were running this in WSL or Windows desktop right ? I managed to reproduce on my side but it seems to be linked to the model loading.

Reproduction:

# In Docker Desktop
docker run -it -v <windows_folder_path_with_model>/:/models/ python:3.10-slim bash
cd /models
pip install diffusers==0.27.2 torch transformers accelerate

Then run:


from datetime import datetime
import torch
from diffusers import StableDiffusionXLPipeline

model_path = "albedobond/albedobase-xl-v2.1.safetensors"
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
print(f"Loading pipeline: {datetime.utcnow()}"); pipeline = StableDiffusionXLPipeline.from_single_file(model_path, torch_dtype=torch.float16, local_files_only=True); print(f"Pipeline Loaded: {datetime.utcnow()}")
pipeline.enable_model_cpu_offload()

print(f"Generating: {datetime.utcnow()}"); image = pipeline(prompt=prompt).images[0]; print(f"Generated: {datetime.utcnow()}")
  • If I run the same thing with diffusers==0.26.3 there is no problem even in WSL
  • If I run the same thing but ensure the folder containing the models is in WSL not on windows the problem disappear.

When mounting the model from a windows folder, I notice the from_single_file method is much faster, it returns nearly immediately. But then generation takes ages. I guess the model is just not in Ram so it runs from disk.|

@sayakpaul Do you know if there was any change with the model loading process in 0.27 ?

lerignoux avatar May 03 '24 07:05 lerignoux

We had this PR https://github.com/huggingface/diffusers/pull/6994 - is this related?

yiyixuxu avatar May 03 '24 22:05 yiyixuxu

We had this PR #6994 - is this related?

Yes nice one, Bisect confirmed your info. issues is brought by this commit

Tried to have a look today, but will need more time to see the actual issue deeper. Do you know if anyone familiar with it could help ?

lerignoux avatar May 06 '24 02:05 lerignoux

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]