stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Feature Request]: Include Onnx Pipeline + DirectML for Windows AMD card users (Img2Img and Inpainting are working - Diffusers 0.6.0)
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
As of Diffusers 0.6.0 the Diffusers Onnx Pipeline Supports Txt2Img, Img2Img and Inpainting for AMD cards using DirectML Would it be possible to include the Onnx Pipeline now that Img2Img and Inpainting are working?
- OnnxStableDiffusionPipeline
- OnnxStableDiffusionImg2ImgPipeline
- OnnxStableDiffusionInpaintPipeline
Onnx Pipeline Supports Txt2Img, Img2Img and Inpainting This process works on older AMD cards.
Proposed workflow
- On windows systems with older AMD cards, Onnx Pipeline is set as primary pipe or an option to use in image generation.
- During install the following are downloaded and applied:
pip install diffusers
pip install transformers
pip install onnxruntime
pip install onnx
pip install torch
pip install onnxruntime-directml --force-reinstall
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 --branch onnx --single-branch stable_diffusion_onnx
git clone https://huggingface.co/runwayml/stable-diffusion-inpainting --branch onnx --single-branch stable_diffusion_onnx_inpainting
Additional information
Examples: https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674
Just a few days ago, NMKD GUI got ONNX(and AMD GPUs) support. This web UI should be able to do the same, right!?
The onnx pipeline is so good, and performs faster than torch on cpu only. It would be awesome to have this!
Yep, I'm with this feature. 👍 ONNX pipeline should really be added. Right now I have to switch to booting into Ubuntu whenever I want to use this WebUI.
With pytorch-directml 1.13, we could add this feature without using onnx. All we need is to modify get_optimal_device_name (in devices.py), and add
if has_dml():
return "dml"
dml could not be refrenced by name, so you should also modify get_optimal_device (also in devices.py), adding
if get_optimal_device_name() == "dml"
import torch_directml
return torch_directml.device()
and modify sd_models.py to avoid using "dml" as string, change line from
device = map_location or shared.weight_load_location or devices.get_optimal_device_name()
to
device = map_location or shared.weight_load_location or devices.get_optimal_device()
finally, add a dml workaround to devices.py:
# DML workaround
if has_dml():
orig_cumsum = torch.cumsum
orig_Tensor_cumsum = torch.Tensor.cumsum
torch.cumsum = lambda input, *args, **kwargs: ( orig_cumsum(input.to("cpu"), *args, **kwargs).to(input.device) )
torch.Tensor.cumsum = lambda self, *args, **kwargs: ( orig_cumsum(self.to("cpu"), *args, **kwargs).to(self.device) )
you could define has_dml() wherever suits your need.
To install enviorment:
conda create -n stable_diffusion_directml python=3.10
conda activate stable_diffusion_directml
conda install pytorch=1.13.1 cpuonly -c pytorch
pip install torch-directml==0.1.13.1.dev230119 gfpgan clip
pip install git+https://github.com/mlfoundations/open_clip.git@bb6e834e9c70d9c27d0dc3ecedeebeaeb1ffad6b
# Launch to clone packages including requirements
python .\launch.py --skip-torch-cuda-test --lowvram --precision full --no-half
# Install requirements
pip install -r repositories\CodeFormer\requirements.txt
pip install -r requirements.txt
# Start
python .\launch.py --skip-torch-cuda-test --lowvram --precision full --no-half
Here are examples
@simonlsp that is so awesome. though onnx pipeline does provide some benefits for its cpu users, you can quantize models and run at extremely tiny ram, double the speed of fp16. the onnx cpu inference from diffusers is already 2 times as fast as current pytorch in cpu mode.
@simonlsp wow I didn't know it was this easy to get it running with PyTorch directML. I had a lot of difficulty getting this webui to run on Ubuntu with my Rx570 and it still doesn't work.
I'm gonna try this method in my spare time, thank you and also sd-webui devs this should really be integrated. ⭐
Came across this YouTube video tutorial while trying to figure out AMD with Windows because the auto-installer in the main Readme tried to install Cuda and error-ed out.
Hope this may come in helpful for any wizards out there working on this task.
@ClashSAN @THEGOLDENPRO @mr-september I created another issue requesting feature for pytorch-directml, as this issue thread focus on onnx, we might be off-topic here.
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7600
Any updates on this issue regarding onnx support or maybe Pytorch-directML. (I apologize in advance for going off topic)