ipex-llm issues

Results 608 ipex-llm issues

Sort by recently updated

Quantized model loading method expects the model should be locally available.

I use ipex-llm to quantize and push models to hub. But it seems `load_low_bit` expects the model to be locally available and cant take it from huggingface hub. It would...

unrahul

user issue

vllm-cpu bug - Qwen2Attention' object has no attribute 'kv_scale'

When running a Qwen1.5 model, it loads but have this error when serving: ``` handle: Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish task.result() File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line...

bratao

user issue

[Windows] Qwen1.5-7B 性能优化

运行过程中会对多个长输入回复总结，希望能在Qwen1.5-7B模型下进一步优化首字和rest token处理延时。

juan-OY

user issue

"can NOT allocate memory block with size larger than 4GB" on Arc A770 GPU when inference

I use glm-4-9b-chat model to process an input about 4k tokens, and I got an "RuntimeError". The code is directly copied from "https://github.com/THUDM/GLM-4/blob/main/basic_demo/trans_cli_demo.py" with some modification to apply to my...

Eternal-YMZ

user issue

Is there a way to run ollama with IPEX-LLM on CPU

I want to run ollama with IPEM-LLM on a machine with 4 Intel Xeon CPU E7-4830 v3 processors and 256GB of memory. The operating system is Ubuntu 24.04. I followed...

reeingal

user issue

ubuntu 22.04 MTL 165h benchmark Aborted (core dumped)

(llm) peiyuan@peiyuan:~/ipex-llm/python/llm/dev/benchmark/all-in-one$ python run.py /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to...

taotao1-1

user issue

Support for max_loaded_maps and num_parallel variables/parameter

It does not seem that ollama running on ipex-llm supports the most recent max_loaded_maps and num_parallel variables/parameters. Is it supported in current ollama version under llama-cpp? how does one enables...

jars101

user issue

Add check before access model.device in load_low_bit

In load_low_bit(), we'll check whether model.device.type in ('cpu', 'meta'). Since some models do not have 'device' attribute, there would get error when access model.device.type. Add `hasattr(model, 'device')` before access model.device.

jenniew

Fix EAGLE README to remove intel pytorch on CPU

Update EAGLE README on CPU with downgrading setuptools version. Since intel pytorch is not compatible with setuptools 70.0.0+

jenniew

CPU finetune can't work

Hi, I have installed ipex-llm follow the docs: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/QLoRA-FineTuning and i meet the error ``` found intel-openmp in /root/miniconda3/envs/llm/lib/libiomp5.so The installed version of bitsandbytes was compiled without GPU support. 8-bit...

sanbuphy

user issue

ipex-llm
ipex-llm copied to clipboard

Metadata

Quantized model loading method expects the model should be locally available.

vllm-cpu bug - Qwen2Attention' object has no attribute 'kv_scale'

[Windows] Qwen1.5-7B 性能优化

"can NOT allocate memory block with size larger than 4GB" on Arc A770 GPU when inference

Is there a way to run ollama with IPEX-LLM on CPU

ubuntu 22.04 MTL 165h benchmark Aborted (core dumped)

Support for max_loaded_maps and num_parallel variables/parameter

Add check before access model.device in load_low_bit

Fix EAGLE README to remove intel pytorch on CPU

CPU finetune can't work

← Metadata

Owner

Metadata

ipex-llm ipex-llm copied to clipboard

Metadata

← Metadata

Owner

Metadata

ipex-llm
ipex-llm copied to clipboard