ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Ma...

Results 608 ipex-llm issues
Sort by recently updated
recently updated
newest added

I use ipex-llm to quantize and push models to hub. But it seems `load_low_bit` expects the model to be locally available and cant take it from huggingface hub. It would...

user issue

When running a Qwen1.5 model, it loads but have this error when serving: ``` handle: Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish task.result() File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line...

user issue

运行过程中会对多个长输入回复总结,希望能在Qwen1.5-7B模型下进一步优化首字和rest token处理延时。

user issue

I use glm-4-9b-chat model to process an input about 4k tokens, and I got an "RuntimeError". The code is directly copied from "https://github.com/THUDM/GLM-4/blob/main/basic_demo/trans_cli_demo.py" with some modification to apply to my...

user issue

I want to run ollama with IPEM-LLM on a machine with 4 Intel Xeon CPU E7-4830 v3 processors and 256GB of memory. The operating system is Ubuntu 24.04. I followed...

user issue

(llm) peiyuan@peiyuan:~/ipex-llm/python/llm/dev/benchmark/all-in-one$ python run.py /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to...

user issue

It does not seem that ollama running on ipex-llm supports the most recent max_loaded_maps and num_parallel variables/parameters. Is it supported in current ollama version under llama-cpp? how does one enables...

user issue

In load_low_bit(), we'll check whether model.device.type in ('cpu', 'meta'). Since some models do not have 'device' attribute, there would get error when access model.device.type. Add `hasattr(model, 'device')` before access model.device.

Update EAGLE README on CPU with downgrading setuptools version. Since intel pytorch is not compatible with setuptools 70.0.0+

Hi, I have installed ipex-llm follow the docs: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/QLoRA-FineTuning and i meet the error ``` found intel-openmp in /root/miniconda3/envs/llm/lib/libiomp5.so The installed version of bitsandbytes was compiled without GPU support. 8-bit...

user issue