stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Feature Request]: Include Onnx Pipeline + DirectML for Windows AMD card users (Img2Img and Inpainting are working - Diffusers 0.6.0)

Open averad opened this issue 2 years ago • 9 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

As of Diffusers 0.6.0 the Diffusers Onnx Pipeline Supports Txt2Img, Img2Img and Inpainting for AMD cards using DirectML Would it be possible to include the Onnx Pipeline now that Img2Img and Inpainting are working?

  • OnnxStableDiffusionPipeline
  • OnnxStableDiffusionImg2ImgPipeline
  • OnnxStableDiffusionInpaintPipeline

Onnx Pipeline Supports Txt2Img, Img2Img and Inpainting This process works on older AMD cards.

Proposed workflow

  1. On windows systems with older AMD cards, Onnx Pipeline is set as primary pipe or an option to use in image generation.
  2. During install the following are downloaded and applied:
pip install diffusers
pip install transformers
pip install onnxruntime
pip install onnx
pip install torch
pip install onnxruntime-directml --force-reinstall
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 --branch onnx --single-branch stable_diffusion_onnx
git clone https://huggingface.co/runwayml/stable-diffusion-inpainting --branch onnx --single-branch stable_diffusion_onnx_inpainting

Additional information

Examples: https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674

averad avatar Oct 24 '22 08:10 averad

Just a few days ago, NMKD GUI got ONNX(and AMD GPUs) support. This web UI should be able to do the same, right!?

cyatarow avatar Dec 17 '22 12:12 cyatarow

The onnx pipeline is so good, and performs faster than torch on cpu only. It would be awesome to have this!

ClashSAN avatar Dec 19 '22 06:12 ClashSAN

Yep, I'm with this feature. 👍 ONNX pipeline should really be added. Right now I have to switch to booting into Ubuntu whenever I want to use this WebUI.

THEGOLDENPRO avatar Jan 22 '23 14:01 THEGOLDENPRO

With pytorch-directml 1.13, we could add this feature without using onnx. All we need is to modify get_optimal_device_name (in devices.py), and add

if has_dml():
    return "dml"

dml could not be refrenced by name, so you should also modify get_optimal_device (also in devices.py), adding

if get_optimal_device_name() == "dml"
    import torch_directml
    return torch_directml.device()

and modify sd_models.py to avoid using "dml" as string, change line from

device = map_location or shared.weight_load_location or devices.get_optimal_device_name()

to

device = map_location or shared.weight_load_location or devices.get_optimal_device()

finally, add a dml workaround to devices.py:

# DML workaround
if has_dml():
    orig_cumsum = torch.cumsum
    orig_Tensor_cumsum = torch.Tensor.cumsum
    torch.cumsum = lambda input, *args, **kwargs: ( orig_cumsum(input.to("cpu"), *args, **kwargs).to(input.device) )
    torch.Tensor.cumsum = lambda self, *args, **kwargs: ( orig_cumsum(self.to("cpu"), *args, **kwargs).to(self.device) )

you could define has_dml() wherever suits your need.

To install enviorment:

conda create -n stable_diffusion_directml python=3.10
conda activate stable_diffusion_directml
conda install pytorch=1.13.1 cpuonly -c pytorch
pip install torch-directml==0.1.13.1.dev230119 gfpgan clip
pip install git+https://github.com/mlfoundations/open_clip.git@bb6e834e9c70d9c27d0dc3ecedeebeaeb1ffad6b
# Launch to clone packages including requirements
python .\launch.py --skip-torch-cuda-test --lowvram --precision full --no-half
# Install requirements
pip install -r repositories\CodeFormer\requirements.txt
pip install -r requirements.txt
# Start
python .\launch.py --skip-torch-cuda-test --lowvram --precision full --no-half

Here are examples Sample1 Sample2

simonlsp avatar Feb 06 '23 16:02 simonlsp

@simonlsp that is so awesome. though onnx pipeline does provide some benefits for its cpu users, you can quantize models and run at extremely tiny ram, double the speed of fp16. the onnx cpu inference from diffusers is already 2 times as fast as current pytorch in cpu mode.

ClashSAN avatar Feb 06 '23 17:02 ClashSAN

@simonlsp wow I didn't know it was this easy to get it running with PyTorch directML. I had a lot of difficulty getting this webui to run on Ubuntu with my Rx570 and it still doesn't work.

I'm gonna try this method in my spare time, thank you and also sd-webui devs this should really be integrated. ⭐

THEGOLDENPRO avatar Feb 07 '23 00:02 THEGOLDENPRO

Came across this YouTube video tutorial while trying to figure out AMD with Windows because the auto-installer in the main Readme tried to install Cuda and error-ed out.

Hope this may come in helpful for any wizards out there working on this task.

mr-september avatar Feb 10 '23 04:02 mr-september

@ClashSAN @THEGOLDENPRO @mr-september I created another issue requesting feature for pytorch-directml, as this issue thread focus on onnx, we might be off-topic here.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7600

simonlsp avatar Feb 10 '23 06:02 simonlsp

Any updates on this issue regarding onnx support or maybe Pytorch-directML. (I apologize in advance for going off topic)

THEGOLDENPRO avatar Feb 21 '23 01:02 THEGOLDENPRO