ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Ma...

Results 608 ipex-llm issues
Sort by recently updated
recently updated
newest added

## Description ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ] N/A - [ ]...

Hi team, I want to release the related memory via del model variable after model generate, but it does not work as my expectation. The demo code is as below,...

user issue

![微信图片_20240627142704](https://github.com/intel-analytics/ipex-llm/assets/166265863/13764abc-6586-43cc-8ecf-08bb042f194c) ![微信图片_20240627142723](https://github.com/intel-analytics/ipex-llm/assets/166265863/c88be310-41c4-40b0-bff6-f0f5cdcd47e2) ![微信图片_20240627142727](https://github.com/intel-analytics/ipex-llm/assets/166265863/ebc91b91-d058-4c72-9451-bf90ebafeab3) 用ollama qwen2:7b

user issue

python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh python/llm/example/GPU/Deepspeed-AutoTP/run_vicuna_33b_arc_2_card.sh python/llm/dev/benchmark/all-in-one/run-deepspeed-arc.sh Current the following code only enable on the Intel Core CPU. But on Intel Xeon CPU, also need enable the SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS to improve performance. ``` if grep...

user issue

The current all-in-one benchmark save the csv file with the name only have date. if we run multi test in the same day, the older test data will be overwrite...

user issue

Optimized the Mixtral model by using ipex_llm.optimize_model() to transform it to low-bit and then save it and then load it. Set "max_length": 1024 yet getting a warning that `max_length` (=20)...

user issue

Target Platform: MTL (Core ultra 7 165H) issue: codeqwen-1_5-7b-chat-q4_k_m.gguf using ipex-llm as backend for llama.cpp has performance gap compared with pytorch. minimum throughput requirement: >15 tokens/s ideal throughput requirement: >...

user issue

I meet this issue while using ollama on MTL iGPU ![image](https://github.com/intel-analytics/ipex-llm/assets/92354341/b9cc7b61-3b61-4615-b1f2-40a85ac22aee) my IPEX-LLM version as below ![image](https://github.com/intel-analytics/ipex-llm/assets/92354341/5f0d3e51-57fa-4b3e-8e85-190218be9ed2) iGPU info as below ![image](https://github.com/intel-analytics/ipex-llm/assets/92354341/9003a88b-ad87-42ce-9914-52e03a8d6315)

user issue

Traceback is as followed, I was running ChatGLM4-9b-chat on my laptop. Device configurations OS: Win 11 23H2 (22631.3737) - CPU: i7-1260P - GPU: 'Intel(R) Iris(R) Xe Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu,...

user issue

HOST安装的步骤 conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install transformers==4.37.0 pip install oneccl_bind_pt==2.1.100...

user issue