stable-diffusion-webui [Feature Request]: torch: 2.2.0+cu121 + scaled_dot_product_attention ((SDPA) now supports FlashAttention-2) more productivity pytorch 2.2.0 and cuda 12.1

[Feature Request]: torch: 2.2.0+cu121 + scaled_dot_product_attention ((SDPA) now supports FlashAttention-2) more productivity pytorch 2.2.0 and cuda 12.1

Open TimmekHW opened this issue 1 year ago • 6 comments

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

torch: 2.2.0+cu121
more perfomance
FP8
slightly less memory consumption
scaled_dot_product_attention ((SDPA) now supports FlashAttention-2)

Proposed workflow

torch: 2.2.0+cu121

open launch_utils.py from stable-diffusion-webui\modules
line 315

 def prepare_environment():
    torch_index_url = os.environ.get('TORCH_INDEX_URL', "https://download.pytorch.org/whl/cu121")
    torch_command = os.environ.get('TORCH_COMMAND', f"pip install torch==2.2.0 torchvision==0.17.0 --extra-index-url {torch_index_url}")
    if args.use_ipex:
        if platform.system() == "Windows":
            # The "Nuullll/intel-extension-for-pytorch" wheels were built from IPEX source for Intel Arc GPU: https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main
            # This is NOT an Intel official release so please use it at your own risk!!
            # See https://github.com/Nuullll/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu-master%2Bdll-bundle for details.
            #
            # Strengths (over official IPEX 2.0.110 windows release):
            #   - AOT build (for Arc GPU only) to eliminate JIT compilation overhead: https://github.com/intel/intel-extension-for-pytorch/issues/399
            #   - Bundles minimal oneAPI 2023.2 dependencies into the python wheels, so users don't need to install oneAPI for the whole system.
            #   - Provides a compatible torchvision wheel: https://github.com/intel/intel-extension-for-pytorch/issues/465
            # Limitation:
            #   - Only works for python 3.10
            url_prefix = "https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.1.10%2Bxpu-master%2Bdll-bundle"
            torch_command = os.environ.get('TORCH_COMMAND', f"pip install {url_prefix}/torch-2.1.0a0+cxx11.abi-cp311-cp311-win_amd64.whl {url_prefix}/torchvision-0.16.0a0+cxx11.abi-cp311-cp311-win_amd64.whl {url_prefix}/intel_extension_for_pytorch-2.1.10+xpu-cp311-cp311-win_amd64.whl")

delete the "venv" folder
implement scaled_dot_product_attention

Additional information

Also, in addition to scaled_dot_product_attention, I would like to update all old dependencies. requirements.txt: transformers==4.37.2

requirements_versions.txt

Pillow==10.2.0
accelerate==0.26.1
basicsr==1.4.2
blendmodes==2024.1
clean-fid==0.1.35
einops==0.4.1
fastapi==0.94.0
gfpgan==1.3.8
gradio==3.41.2
httpcore==0.15
inflection==0.5.1
jsonmerge==1.8.0
kornia==0.6.7
lark==1.1.2
numpy==1.26.3
omegaconf==2.3.0
open-clip-torch==2.24.0
piexif==1.1.3
psutil==5.9.5
pytorch_lightning==2.1.3
realesrgan==0.3.0
resize-right==0.0.2
safetensors==0.4.2
scikit-image==0.21.0
timm==0.9.2
tomesd==0.1.3
torch==2.2.0
torchdiffeq==0.2.3
torchsde==0.2.6
transformers==4.37.2
httpx==0.24.1

But to work you will need rename from pytorch_lightning.utilities.distributed import rank_zero_only in from pytorch_lightning.utilities.rank_zero import rank_zero_only sd_hijack_ddpm_v1.py , ddpm_edit.py and ddpm.py

Jan 30 '24 23:01 TimmekHW

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Feature Request]: torch: 2.2.0+cu121 + scaled_dot_product_attention ((SDPA) now supports FlashAttention-2) more productivity pytorch 2.2.0 and cuda 12.1

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

stable-diffusion-webui
stable-diffusion-webui copied to clipboard