ipex-llm
ipex-llm copied to clipboard
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Ma...
I use ipex-llm to quantize and push models to hub. But it seems `load_low_bit` expects the model to be locally available and cant take it from huggingface hub. It would...
When running a Qwen1.5 model, it loads but have this error when serving: ``` handle: Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish task.result() File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line...
I use glm-4-9b-chat model to process an input about 4k tokens, and I got an "RuntimeError". The code is directly copied from "https://github.com/THUDM/GLM-4/blob/main/basic_demo/trans_cli_demo.py" with some modification to apply to my...
I want to run ollama with IPEM-LLM on a machine with 4 Intel Xeon CPU E7-4830 v3 processors and 256GB of memory. The operating system is Ubuntu 24.04. I followed...
(llm) peiyuan@peiyuan:~/ipex-llm/python/llm/dev/benchmark/all-in-one$ python run.py /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to...
It does not seem that ollama running on ipex-llm supports the most recent max_loaded_maps and num_parallel variables/parameters. Is it supported in current ollama version under llama-cpp? how does one enables...
In load_low_bit(), we'll check whether model.device.type in ('cpu', 'meta'). Since some models do not have 'device' attribute, there would get error when access model.device.type. Add `hasattr(model, 'device')` before access model.device.
Update EAGLE README on CPU with downgrading setuptools version. Since intel pytorch is not compatible with setuptools 70.0.0+
Hi, I have installed ipex-llm follow the docs: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/QLoRA-FineTuning and i meet the error ``` found intel-openmp in /root/miniconda3/envs/llm/lib/libiomp5.so The installed version of bitsandbytes was compiled without GPU support. 8-bit...