text-generation-webui
text-generation-webui copied to clipboard
Intel Arc thread
This thread is dedicated to discussing the setup of the webui on Intel Arc GPUs.
You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all Intel Arc users.
OK, so some notes:
- In my testing, CLBlast is quite slow when compared to CUDA or ROCm when used with llama.cpp (I'm not using llama.cpp-python as it simply refuses to use the GPU no matter what I do, despite being built with OpenCL support, but with koboldcpp I get ~1.7t/s with a 13b model.)
- None of the other backends work right now, but maybe they could work if IPEX is used (hopefully, it's as simple as Intel says it is. It'd still require custom versions of each script though.) Given the fact that IPEX is now supported on Windows (haven't tested that yet, but will with SDUI) maybe it would be worth it to see if that could work.
Sorry, this is going to be a somewhat long post. I've been looking into this a bit but unlike with other areas of user-facing ML, the LLM community vs others communities involved in user-facing ML seems to have a lot more limited options in what can be used to get anything Intel working easily at full speed. For example, in the image generation space, it's much easier to slot in Intel's Extension for Pytorch (IPEX) because everyone is using Pytorch directly one way or another in the software projects and the extension is designed and intended to be pretty easy to insert into a project already using Pytorch. In stark comparison, backends in the LLM space do not use Pytorch directly, there's a lot of lower level programming going into C/C++ and custom libraries and model deployment due to performance considerations and the RAM needed to load these projects which were all but unavailable for the average consumer to acquire. So this means there is no "easy" option to slot in something which would make things easy with the lack of something like Pytorch in the picture.
That wouldn't really be a problem if there is a lower level solution. However, we get into the real main issue which is that Intel is not taking the same path as AMD when it comes to CUDA compatibility. They have a different strategy they have been approaching with regards to what they have been doing as a hardware company for the last couple of years. They consolidated all their software and have unified them under something called oneAPI which is their intention to write something once and deploy everywhere in their ecosystem. That goes from anything higher level like Intel's Extension for Pytorch/TensorFlow to middleware libraries like oneMKL/oneDNN all the way down to Intel's compilers and runtime.

As a result, there is nothing like HIP which Intel is providing to anyone (There is a community project called chipStar trying to take that approach but it still seems too early and when I tried it, it isn't ready to try and even start tackling complex projects). What Intel intends is for people to port their software directly from CUDA into SYCL, a Khronos standard that basically is like OpenCL but with C++ instead of which they had provided an automatic tool here to port over CUDA code. The intention is that the output of the conversion can then with very little effort be modified to support their SYCL extensions with DPC++ and pulling in their libraries which interface with SYCL and then this would be able to target everything Intel from CPU to GPU to FPGAs to custom hardware AI accelerators and etc. SYCL then either will get compiled down to Level-Zero, which is the actual API that will run on Intel's devices or it can compile into AMD ROCm and Nvidia's CUDA too which was announced by Codeplay last year. And as fallback, it will compile to OpenCL which everyone supports.
As a result of the above, I would say that it would take some serious effort to get Intel GPUs working at the moment at full speed for anything. That is not to say it is impossible, but it would take either a new software project to make a backend or some sort of large patch to existing backends to make it happen. It's not like I don't see where Intel's coming from and if their vision actually works, things wouldn't be as difficult to deal with given a possible "write once run anywhere" approach. But as is at the moment, it's not tested enough for people to make that effort and it is very incompatible with CUDA and ROCm efforts even if the APIs roughly do the same thing. Using OpenCL if we're talking about Intel GPUs will get users about roughly halfway but it will never be as optimized as CUDA/ROCm and the extra effort needed to get that last portion of optimization even if CLBLast tomorrow can optimize their existing OpenCL code to run on Intel GPUs is a pretty dim prospect in my opinion. I have no clue what can be done about that in a planned fashion but that seems to be the situation at the moment.
it appears that HF Transformers might support XPU now https://github.com/huggingface/transformers/pull/25714 which would mean that even if nothing else works, this might. (no quants because no bitsandbytes, but that's also being worked on it seems here: https://github.com/TimDettmers/bitsandbytes/pull/747)
I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: https://github.com/oobabooga/text-generation-webui/commit/0306b61bb09fb2b5d7b42d02e90267ff62ad3173
The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.
I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61
The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.
Keep in mind Windows native does not work yet because Intel botched their release process and I suspect most people wanting to try would have that. So only Linux and WSL 2 for now. Windows also doesn't support Ahead of Time compilation in earlier version of the Windows Pytorch pip package too which makes running the first pass of anything painful. See https://github.com/intel/intel-extension-for-pytorch/issues/398 and https://github.com/intel/intel-extension-for-pytorch/issues/399 for more information.
Intel always manages to botch something when it comes to Arc, so not surprised. Will test this out once i get my WSL2 install back working again.
I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61
The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.
This doesn't work, it checks if CUDA is available and then uses the CPU, rather than trying the extension. Also, a good idea would be to call "source /opt/intel/oneapi/setvars.sh" from the script, to auto-initialize the oneAPI environment. Otherwise, users might not get it working and wouldn't be able to figure out why.
For now, it seems like there are now unofficial Windows PIP packages available here that address both the issues I stated above from one of the WebUI contributors for getting IPEX working optimally on Windows natively. Install at your own risk knowing they are not from Intel and not official.
intel extension for pytorch supports one version of pytorch and if we change to it in the one click installer file, it is downloading but when as per requirements file, the code is downloading same requirements file which is overwriting the exisiting supporting file and the system is unable to use the intel gpu. can anyone provide a get around to this problem. We need to check which pytorch version is compatible with the intel extension for pytorch module and download those versions only.
Changed the one_click.py so that it downloads and installs the (hopefully) correct pytorch and torch packages, and created a requiremts.txt which may or may not be correct, for Intel Arc since there was none and also added the calls for them in one_click.py.
It downloads and installs the packages but I am stuck at Installing extensions requirements. As soon as this part starts it seems to swtich back to CPU (!?) and installs nvidia packages and uninstalls the intel torch versions.
Update: it looks like the requirements from the various extensions subfolders request the nvidia packages as dependencies for the required packages.
New all-in-one Pytorch for Windows packages are available here which is preferable to the other packages I linked earlier as they had dependencies which couldn't easily be satisfied without a requirements.txt detailing them. There does seem to be a bug in the newest Windows drivers as seen in https://github.com/intel/intel-extension-for-pytorch/issues/442, you have to revert to something older than version 4885. Version 4676 here is recommended as that was what was used to build the pip packages.
Wouldn't it be easiest to make an option to compile llama.cpp with CLBlast?
hello :) can webui be used with arc a770 to launch gptq models?
transformers giving me error WARNING:No GPU has been detected by Pytorch. Falling back to CPU mode
.
after clean install it gave me error
AssertionError: Torch not compiled with CUDA enabled
after another clean install now i have this error
raise RuntimeError("GPU is required to quantize or run quantize model.")```
https://github.com/intel-analytics/BigDL/tree/main/python/llm
[bigdl-llm](https://bigdl.readthedocs.io/en/latest/doc/LLM/index.html) is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency[1](https://github.com/intel-analytics/BigDL/tree/main/python/llm#user-content-fn-1-bc5e065cc5f0f26d432e4d76786a6dd7) (for any PyTorch model).
can this be used with webui and intel arc gpus?
Seems that Intel has broken the pytorch extension for xpu repo and it's going to a HTTP site instead of https. Here is a workaround for the one_click.py: "python -m pip install --trusted-host ec2-52-27-27-201.us-west-2.compute.amazonaws.com torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f 'http://ec2-52-27-27-201.us-west-2.compute.amazonaws.com/ipex-release.php?device=xpu&repo=us&release=stable'"
But seeing other errors related to the PyTorch version: /text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelerate/utils/imports.py:245: UserWarning: Intel Extension for PyTorch 2.0 needs to work with PyTorch 2.0.*, but PyTorch 2.1.0 is found. Please switch to the matching version and run again.
Hello, does it currently work with Intel Arc (on Arch Linux) without much of a problem? I can run Vladmir's automatic1111 on this computer, so maybe I think this could also run, but I am not sure.
PS: I ran the installer and it exited with the following error:
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Looking in links: https://developer.intel.com/ipex-whl-stable-xpu
ERROR: Could not find a version that satisfies the requirement torch==2.0.1a0 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1)
ERROR: No matching distribution found for torch==2.0.1a0
Command '. "/home/username/diffusion/text-generation-webui/installer_files/conda/etc/profile.d/conda.sh" && conda activate "/home/username/diffusion/text-generation-webui/installer_files/env" && conda install -y -k ninja git && python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu && python -m pip install py-cpuinfo==9.0.0' failed with exit status code '1'.Exiting now.
Try running the start/update script again.
As of right now (2023-11-27) Intel's instructions to install their pytorch extension do not work. In order to get the three necessary wheel files (torch 2.0.1a0, torchvision 0.15.2a0, intel_extension_for_pytorch 2.0.110+xpu) I had to download them as files from the URL provided, then install them with pip.
This is not enough to get ARC support working. The answer still seems to be "it should work, in theory, but nobody's actually done it yet".
As of right now (2023-11-27) Intel's instructions to install their pytorch extension do not work.
Isn't a stupid move from Intel? I mean, Intel should have done their best to make their GPU work with the latest A.I stuff and help developers to achieve it, instead of focusing on games. These days, people constantly talk about A.I., not about triple-A 3D games. This kind of constant frustration with the A.I. apps makes me think about switching to NVidia (if they fix the damn Wayland problem).
Any way, please let us know when it works again.
The packages are there at https://developer.intel.com/ipex-whl-stable-xpu which you can browse, pip just isn't picking them up for whatever reason now with the URL. You need to manually install the packages or directly link the packages that are needed for install. For my Linux install, I had to do the following:
pip install https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torch-2.0.1a0%2Bcxx11.abi-cp310-cp310-linux_x86_64.whl https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torchvision-0.15.2a0%2Bcxx11.abi-cp310-cp310-linux_x86_64.whl https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/intel_extension_for_pytorch-2.0.110%2Bxpu-cp310-cp310-linux_x86_64.whl
The package versions needed for install will vary depending on what OS platform and Python version is being used on your machine.
It says that the environment is externally managed and try pacman -S python-xyz
where xyz
is the package. In that case, what do I need to do?
As of right now, there are 3 possible ways to get this to work with ARC GPUs:
- The Intel Extension for PyTorch, which currently doesn't work on Windows.
- OpenVINO with PyTorch dev versions (unsure if this actually would work, OpenVINO needs to be supported by the frontend to be used, and OpenVINO supports LLMs, just haven't seen it used before for something like this)
- The new Intel Extension for Transformers: the most promising, supports models converted with llama.cpp (though I don't know if it supports ARC GPUs yet, last I checked, support was forthcoming)
1. The Intel Extension for PyTorch, which currently doesn't work on Windows.
As I posted in https://github.com/oobabooga/text-generation-webui/issues/3761#issuecomment-1771257122, Windows does work with Intel Extension for Pytorch but you need to install a third party package since Intel does not do it at this time. Using the latest Windows drivers now work too. Intel has started in the issue tracker on GitHub they will Windows packaging soon. IPEX is also due for an update soon.
I was under the impression there were still driver issues, but if it works now that's great.
I'm not sure if this is the right place to post this. I receive the below error after installing OobaBooga using the default Arc install option on Windows. The install seemed to go well but running it results in the below DLL load error. Other threads that mentioned this loading error suggested it might be a PATH issue. I tried adding a few paths to the OS environment but couldn't resolve it. Any suggestions?
It's an Arc A770 on Windows 10. Intel® Graphics Driver 31.0.101.5081/31.0.101.5122 (WHQL Certified). I also tried rolling back to driver 4676 and doing a clean install with the same results. Some of the paths I added were those listed here. I'm also not seeing any of the DLL's listed at that link in those directories. Instead, I have intel-ext-pt-gpu.dll and intel-ext-pt-python.dll in "%PYTHON_ENV_DIR%\lib\site-packages\intel_extension_for_pytorch\bin" and no DLL's in "%PYTHON_ENV_DIR%\lib\site-packages\torch\lib". backend_with_compiler.dll is there.
Traceback (most recent call last) ─────────────────────────────────────────┐
│ C:\text-generation-webui\server.py:6 in <module> │
│ │
│ 5 │
│ > 6 import accelerate # This early import makes Intel GPUs happy │
│ 7 │
│ │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate\__init__.py:3 in <module> │
│ │
│ 2 │
│ > 3 from .accelerator import Accelerator │
│ 4 from .big_modeling import ( │
│ │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate\accelerator.py:32 in <module> │
│ │
│ 31 │
│ > 32 import torch │
│ 33 import torch.utils.hooks as hooks │
│ │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\__init__.py:139 in <module> │
│ │
│ 138 err.strerror += f' Error loading "{dll}" or one of its dependencies.' │
│ > 139 raise err │
│ 140 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
OSError: [WinError 126] The specified module could not be found. Error loading
"C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its
dependencies.
Press any key to continue . . .
I updated the code and run it again (did not do anything else). This time, it passed the previous crash "No matching distribution found for torch==2.0.1a0", but after downloading a lot of stuff, it crashed with the following. If I run the script again, I get the same output as below again.
*******************************************************************
* WARNING: You haven't downloaded any model yet.
* Once the web UI launches, head over to the "Model" tab and download one.
*******************************************************************
╭───────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /home/username/diffusion/text-generation-webui/server.py:6 in <module> │
│ │
│ 5 │
│ ❱ 6 import accelerate # This early import makes Intel GPUs happy │
│ 7 │
│ │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelera │
│ te/__init__.py:3 in <module> │
│ │
│ 2 │
│ ❱ 3 from .accelerator import Accelerator │
│ 4 from .big_modeling import ( │
│ │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelera │
│ te/accelerator.py:32 in <module> │
│ │
│ 31 │
│ ❱ 32 import torch │
│ 33 import torch.utils.hooks as hooks │
│ │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:234 in <module> │
│ │
│ 233 if USE_GLOBAL_DEPS: │
│ ❱ 234 _load_global_deps() │
│ 235 from torch._C import * # noqa: F403 │
│ │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:193 in _load_global_deps │
│ │
│ 192 if not is_cuda_lib_err: │
│ ❱ 193 raise err │
│ 194 for lib_folder, lib_name in cuda_libs.items(): │
│ │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:174 in _load_global_deps │
│ │
│ 173 try: │
│ ❱ 174 ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) │
│ 175 except OSError as err: │
│ │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/ctypes/__init__.py:376 │
│ in __init__ │
│ │
│ 375 if handle is None: │
│ ❱ 376 self._handle = _dlopen(self._name, mode) │
│ 377 else: │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
@HubKing Run "source /opt/intel/oneapi/setvars.sh" and try again. If you don't have it, make sure to install the oneAPI Basekit.
Thanks. intel-oneapi-basekit 2024.0.0.49564-2 had been already installed and running that command solved the problem. But why did this problem happen in the first place? The user of that package is supposed to add that shell script manually to the environment?
Thanks. intel-oneapi-basekit 2024.0.0.49564-2 had been already installed and running that command solved the problem. But why did this problem happen in the first place? The user of that package is supposed to add that shell script manually to the environment?
Yes, exactly that.
What models run on an Intel Arc GPU? It seems like .gguf models are running on the CPU.
@djstraylight
For me at least, GGUF's default loading via llama.cpp, which is in the process of implementng Arc support.