LocalAI Distributed inference for diffusers

Is your feature request related to a problem? Please describe. I would like to be able to use multiple GPUs to generate multiple images at a time when using the diffusers backend.

Describe the solution you'd like The accelerate library already provide such support, see for instance the article Distributed inference with multiple GPUs.

from accelerate import PartialState
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
)
distributed_state = PartialState()
pipeline.to(distributed_state.device)

with distributed_state.split_between_processes(["a dog", "a cat"]) as prompt:
    result = pipeline(prompt).images[0]
    result.save(f"result_{distributed_state.process_index}.png")

I can help implementing this feature, I'm just not sure where to start. From what I've understood, the GenerateImage method only handles a single prompt at a time.

Additional context Thanks a lot for this great work !

Sep 01 '23 13:09 maxjcohen

@maxjcohen that'd be great! If you want to take a stab at it, here is the diffusers backend: https://github.com/go-skynet/LocalAI/blob/master/extra/grpc/diffusers/backend_diffusers.py#L215

Sep 01 '23 14:09 mudler

Changing the device to move the pipeline to should be straightforward, however I can't see how to adapt the second part, as accelerate expect us to explicitly list the prompts:

with distributed_state.split_between_processes(["a dog", "a cat"]) as prompt:
    result = pipeline(prompt).images[0]

As the GenerateImage method only handles a single prompt at a time, I suppose we have to adapt the code before reaching it.

Sep 01 '23 16:09 maxjcohen

I would like to add that I think accelerate works on all transformers models and pytorch models as well.

Sep 05 '24 09:09 ecyht2

So, currently, diffusers do not support multiple GPUs? Stable Diffusion failed on my device, and I found that it only uses a single 4090 GPU to load the diffusion model. However, other llama.cpp models can utilize multiple GPUs.

Nov 07 '24 15:11 homjay

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Dec 04 '25 02:12 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Dec 09 '25 02:12 github-actions[bot]