text-generation-webui Intel ARC Support

Intel ARC Support

Open linus378 opened this issue 1 year ago • 78 comments

I was wondering if Intel ARC Gpu's work with this. Could not read anything about this here.

Apr 26 '23 15:04 linus378

Also i do wonder if this could support 2 gpu's so you don't have too offload anything into RAM. Such as a arc a770 and a rx 6600.

Apr 26 '23 15:04 linus378

It doesn't have support for OneAPI or OpenVINO currently from my knowledge, as I myself own an Intel Arc GPU.

Apr 27 '23 23:04 dan9070

It doesn't, unfortunately. I really wish it did though as I have a dual A770 system myself (and these cards have a lot of VRAM for the price, and also good low-precision AI accelerators, etc). For now I'm running on CPU which is, of course, horribly slow.

However, one issue is that Intel's support for pytorch on its GPUs needs a special version based on pytorch 1.10 (see https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-extension-for-pytorch-for-gpus.html) but this system uses pytorch 2.0.0. As soon as Intel gpu support for pytorch 2.0.0 comes out though I'm hoping support can be extended in this system (if I can find time maybe I'll even be able to contribute some patches). For CPU pytorch 2.0.0 is already supported: https://intel.github.io/intel-extension-for-pytorch/latest/tutorials/releases.html

In the meantime, it would be great if the readme could at least be updated to say WHAT GPUs are supported.

BTW The one-click installer also fails if you don't have an NVIDIA GPU, even if you select "None". I had to go the git clone route.

May 02 '23 17:05 mmccool

Multi-GPU support for multiple Intel GPUs would, of course, also be nice. MultiGPU is supported for other cards, should not (in theory) be a problem. I personally don't really care about mixing GPUs from different vendors, though :)

A bonus would be the ability to use Intel integrated graphics, although they have limited VRAM capabilities, but maybe good enough for some simple things.

May 02 '23 17:05 mmccool

Would love to see this as well, with the power and amount of VRAM the arc is a great little card for those of us that do more compute stuff than gaming, especially considering the price.

May 11 '23 17:05 rattlecanblack

Intel has released torch 2.0 support for arc gpus. https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu

Aug 14 '23 13:08 miraged3

Does the release of pytorch 2 support move things forward for Arc support?

Aug 30 '23 02:08 itlackey

I have created a pinned thread for Intel Arc discussion and welcome you to move the discussion there: https://github.com/oobabooga/text-generation-webui/issues/3761

To my knowledge, llama-cpp-python should work with GPU acceleration on Intel Arc as long as you compile it with CLBLAST. See https://github.com/oobabooga/text-generation-webui#amd-metal-intel-arc-and-cpus-without-avcx2

Aug 30 '23 17:08 oobabooga

You rock! Thank you for all the hard work on this project!

Sep 01 '23 01:09 itlackey

@oobabooga Intel Arc GPU support is in the pipeline ; the support integration would be started in 2-3 weeks time (by myself) . There are some other items in the pipeline at Intel which we are covering - and we plan to add this to our GPU soon.

Sep 08 '23 10:09 abhilash1910

@abhilash1910 thanks for the info. For XPU inference on transformers, is it currently enough to do

model.to(torch.device('xpu'))

or similar, like here?

Does any special pytorch import command have to be made?

Sep 23 '23 12:09 oobabooga

I found this while researching how this all works.

https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/examples.html

It looks like there shouldn't be much to change, but I'm new to LLM/AI development. So I may be missing something.

Sep 23 '23 15:09 itlackey

Thanks @itlackey. I guess it should be a few changed lines (for the transformers loader):

model = model.to("xpu") in modules.models.huggingface_loader
return input_ids.to(torch.device('xpu')) in modules.text-generation.encode.

It would be nice if someone could test this.

Sep 23 '23 17:09 oobabooga

I'll have time in a few days and will give it a shot. We may also need to make some changes to the installer and/or docker image to load the Intel libs and driver and recompile llama.cpp to get xpu to work. I was able to do this with a docker image for FastChat and llama.cpp. We should be able to do the same for textgen.

Sep 23 '23 18:09 itlackey

Good to know the interest ; thanks @oobabooga @itlackey (helps to determine priority). I will add in the changes starting tomorrow(25th Sept) and that can be tested.

Thanks @itlackey. I guess it should be a few changed lines (for the transformers loader):

model = model.to("xpu") in modules.models.huggingface_loader

return input_ids.to(torch.device('xpu')) in modules.text-generation.encode.

It would be nice if someone could test this.

Sep 24 '23 06:09 abhilash1910

Awesome @abhilash1910 :)

Sep 24 '23 12:09 oobabooga

Hello, I just purchased an Intel Arc A770 16gb. When it arrives (in a week) I will be willing to help test stuff on linux. In general if ARC GPUs become usable, it could be a really nice option, especially if multi GPU is possible.

Sep 30 '23 11:09 Yorizuka

small update: The GPU has arrived, I will install it into my PC when I have time. I am excited to start playing around with LLMs on my own PC.

Oct 08 '23 06:10 Yorizuka

Thanks @itlackey. I guess it should be a few changed lines (for the transformers loader):

model = model.to("xpu") in modules.models.huggingface_loader

return input_ids.to(torch.device('xpu')) in modules.text-generation.encode.

It would be nice if someone could test this.

Doesn't change anything (yet). Using an Intel Iris Xe Graphics (not very good, I know) on WSL2. I'll test some more stuff out.

Oct 12 '23 08:10 TheRealUnderscore

Not sure if this is user error (im new to this) or an actual issue, but I'm getting errors talking about cuda while trying to load in a model. I find this really odd, especially because I chose the IPEX option during the ./start_linux.sh first time install.

2023-10-15 16:30:34 INFO:Loading HuggingFaceH4_zephyr-7b-alpha...
Loading checkpoint shards: 100%|██████████████████| 2/2 [02:04<00:00, 62.41s/it]
2023-10-15 16:32:39 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/modules/ui_model_menu.py", line 201, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/modules/models.py", line 79, in load_model
    output = load_func_map[loader](model_name)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/modules/models.py", line 141, in huggingface_loader
    model = model.cuda()
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2168, in cuda
    return super().cuda(*args, **kwargs)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Oct 15 '23 20:10 Yorizuka

@Yorizuka can you try making those changes to modules/models.py and modules/text-generation.py?

diff --git a/modules/models.py b/modules/models.py
index 5bd9db74..c376c808 100644
--- a/modules/models.py
+++ b/modules/models.py
@@ -137,6 +137,8 @@ def huggingface_loader(model_name):
         if torch.backends.mps.is_available():
             device = torch.device('mps')
             model = model.to(device)
+        elif hasattr(torch, 'xpu') and torch.xpu.is_available():
+            model = model.to('xpu')
         else:
             model = model.cuda()
 
diff --git a/modules/text_generation.py b/modules/text_generation.py
index 0f24dc58..295c7cdd 100644
--- a/modules/text_generation.py
+++ b/modules/text_generation.py
@@ -132,6 +132,8 @@ def encode(prompt, add_special_tokens=True, add_bos_token=True, truncation_lengt
     elif torch.backends.mps.is_available():
         device = torch.device('mps')
         return input_ids.to(device)
+    elif hasattr(torch, 'xpu') and torch.xpu.is_available():
+        return input_ids.to('xpu')
     else:
         return input_ids.cuda()

Oct 15 '23 21:10 oobabooga

I applied the patch, same issue.

2023-10-16 02:25:12 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/modules/ui_model_menu.py", line 201, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/modules/models.py", line 79, in load_model
    output = load_func_map[loader](model_name)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/modules/models.py", line 143, in huggingface_loader
    model = model.cuda()
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2168, in cuda
    return super().cuda(*args, **kwargs)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

to confirm I did the patch correctly, here is the git status:

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   modules/models.py
	modified:   modules/text_generation.py

no changes added to commit (use "git add" and/or "git commit -a")

and my git rev-parse HEAD output d331501ebc83e80c5d8f49c3e7c547730afff5c2

Oct 16 '23 06:10 Yorizuka

print(f"generations: input_ids set! model class: {shared.model.__class__.__name__} | has xpu {hasattr(torch, 'xpu')}") in text-generation/modules prints: (using a GGUF model, though I'm trying to get CBLAS set up right now though, which is probably why llama.cpp is messing up)

So I uninstalled the torch and torchvision installed by the one-click installer and reinstalled IPEX, resulting in an unidentified .so error. Putting export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/anaconda3/envs/tg/lib in ~/.bashrc fixes that. But I still get the same message: (I changed the message slightly, my apologies)

And to add onto what @Yorizuka mentioned, trying run a GPTQ model in Transformers also gives this error: RuntimeError: GPU is required to quantize or run quantize model. alongside WARNING:torch.cuda.is_available() returned False. This means that no GPU has been detected. Falling back to CPU mode.

Oct 16 '23 06:10 TheRealUnderscore

I think the issue described in this comment https://github.com/oobabooga/text-generation-webui/issues/3761#issuecomment-1763345289 is likely related to the issue we are having here.

Oct 16 '23 19:10 Yorizuka

@TheRealUnderscore about the transformers error, can you check if it works after this commit?

https://github.com/oobabooga/text-generation-webui/commit/8ea554bc19bf7df2a08ab7a23322f69829b140db

Oct 16 '23 19:10 oobabooga

@oobabooga It seems the error is something to do with what Yorizuka said. hasattr(torch, 'xpu') returned false in my previous message, so it's not detecting PyTorch XPU whatsoever.

These were my PyTorch settings (via print(torch.__config__.show())) before reinstalling 2.0.1a0:

PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

Image for more readable build settings

And these are my 2.0.1a0 settings. Now lots of things have changed:

PyTorch built with:
  - GCC 11.2
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.2-Product Build 20230613 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

Image for more readable build settings All the differences (I've no clue what could help and what couldn't, so I'm just listing them all):

GCC ver upgraded 9.3 -> 11.2
MKL ver upgraded
MKL-DNN ver downgraded
CPU extensions downgraded AVX512 -> AVX2
CUDA and CUDNN vers removed
CXX_COMPILER devtoolset-9 -> gcc-toolset-11
In CXX-FLAGS:
- D_GLIBCXX_USE_CXX11_ABI state 0 -> 1
- fabi-version removed
- DLIBKINETO_NOCUPTI added
- Werror added, set range-loop-construct
- Wunused-local-typedefs added
- Wno-error added, set deprecated-declarations
- Wno-invalid-partial-specialization removed
- Wno-used-private-field removed
- Wno-aligned-allocation-unavailable removed
- Wno-error added, set redundant-decls
TORCH_VERSION 2.1.0 -> 2.0.1
USE_CUDA ON -> OFF
USE_CUDNN ON -> OFF
USE_NCCL 1 -> OFF

Are any of these settings relevant to the GPU?

I'll keep looking into it on my own, I wouldn't be surprised if it was an installation error by me.

Oct 17 '23 04:10 TheRealUnderscore

I managed to get 0 tokens output with ipex lol Screenshot_1 anyways im sleepy ive been at this all day,

Oct 20 '23 10:10 i6od

Some updates regarding failures to build or compile with our systems(FYI):

For gbnf/ggml based compiler patterns, the support is in progress so there might be failures with older oneapi /dpct (if you are using previous release)
For issues related to IPEX xpu related to build, I would recommend switiching to latest public IPEX. Also tag me in case you are having difficulties building or using IPEX on your arc systems.
This support is in progress and I would update periodically as there are some subsequent works which need to be merged to use this fully . cc @oobabooga and others who are using our devices. Thank you for your continued support and interest on ARC.

Oct 21 '23 07:10 abhilash1910

The changes in https://github.com/oobabooga/text-generation-webui/pull/4340 should make the transformers, AutoGPTQ, and GPTQ-for-LLaMa loaders work on Intel Arc now. It's amazing how much @abhilash1910 has implemented in a single PR!

Testing of these various loaders would be appreciated.

Oct 27 '23 02:10 oobabooga

Thank you @abhilash1910 , @oobabooga and all of you for making this happen! I will try to carve out some time this weekend to do some initial testing on the a770.

Thanks again everyone for the hard work!

Oct 27 '23 03:10 itlackey

text-generation-webui text-generation-webui copied to clipboard

Intel ARC Support

text-generation-webui
text-generation-webui copied to clipboard