ipex-llm
ipex-llm copied to clipboard
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Ma...
## Description ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ] N/A - [ ]...
Hi team, I want to release the related memory via del model variable after model generate, but it does not work as my expectation. The demo code is as below,...
![微信图片_20240627142704](https://github.com/intel-analytics/ipex-llm/assets/166265863/13764abc-6586-43cc-8ecf-08bb042f194c) ![微信图片_20240627142723](https://github.com/intel-analytics/ipex-llm/assets/166265863/c88be310-41c4-40b0-bff6-f0f5cdcd47e2) ![微信图片_20240627142727](https://github.com/intel-analytics/ipex-llm/assets/166265863/ebc91b91-d058-4c72-9451-bf90ebafeab3) 用ollama qwen2:7b
python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh python/llm/example/GPU/Deepspeed-AutoTP/run_vicuna_33b_arc_2_card.sh python/llm/dev/benchmark/all-in-one/run-deepspeed-arc.sh Current the following code only enable on the Intel Core CPU. But on Intel Xeon CPU, also need enable the SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS to improve performance. ``` if grep...
The current all-in-one benchmark save the csv file with the name only have date. if we run multi test in the same day, the older test data will be overwrite...
Optimized the Mixtral model by using ipex_llm.optimize_model() to transform it to low-bit and then save it and then load it. Set "max_length": 1024 yet getting a warning that `max_length` (=20)...
Target Platform: MTL (Core ultra 7 165H) issue: codeqwen-1_5-7b-chat-q4_k_m.gguf using ipex-llm as backend for llama.cpp has performance gap compared with pytorch. minimum throughput requirement: >15 tokens/s ideal throughput requirement: >...
I meet this issue while using ollama on MTL iGPU ![image](https://github.com/intel-analytics/ipex-llm/assets/92354341/b9cc7b61-3b61-4615-b1f2-40a85ac22aee) my IPEX-LLM version as below ![image](https://github.com/intel-analytics/ipex-llm/assets/92354341/5f0d3e51-57fa-4b3e-8e85-190218be9ed2) iGPU info as below ![image](https://github.com/intel-analytics/ipex-llm/assets/92354341/9003a88b-ad87-42ce-9914-52e03a8d6315)
Traceback is as followed, I was running ChatGLM4-9b-chat on my laptop. Device configurations OS: Win 11 23H2 (22631.3737) - CPU: i7-1260P - GPU: 'Intel(R) Iris(R) Xe Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu,...
HOST安装的步骤 conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install transformers==4.37.0 pip install oneccl_bind_pt==2.1.100...