ipex-llm
ipex-llm copied to clipboard
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Ma...
## Description ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ] N/A - [ ]...
As the title says, using starcoder2 times out or appears stuck. Logs attached. [starcoder2_timout.txt](https://github.com/user-attachments/files/15588642/starcoder2_timout.txt)
hi, I have successfully used the code below to test the speed of generating tokens when using qwen-7b under ipex. `def main(model_dir = "Qwen/Qwen-7B-Chat"): seed = 1024 max_experiment_times = 1...
Using ipex-llm docker version for inferencing, but during inference time it experiences errors from util files below is the log: ``` ------------------------------------------------------------------------------------------------------------------------ Inferencing ./samples/customer_sku_transformation.txt ... ------------------------------------------------------------------------------------------------------------------------ The installed version of...
platform:Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz os: Suse 13 model:mistralai/Mistral-7B-Instruct-v0.2 ipex-llm:2.1.0b20240515 transformers: 4.37.0 ldd: 2.22 gcc/g++: 11.1.0 After Loading checkpoint shards 100%, it shows: `Error: Failed to load the...
vesion: 2.1.0b20240610 error: ipex_llm/transformers/models/chatglm4.py", line 342, in core_attn_forward NameError: name 'math' is not defined
When I go to use the `generate.py` script, I get the following error: ```bash python ./generate.py --repo-id-or-model-path 'google/codegemma-7b-it' --prompt 'Write a hello world program in Python' --n-predict 32 Traceback (most...
![微信图片_20240605135354](https://github.com/intel-analytics/ipex-llm/assets/166265863/4bcfc12a-ead8-468a-ab24-dfe60fb1d9d4) The following error occurred after running for a period of time, please refer to the attachment. Currently, no reproduction method has been found. GPU accelerates Ollama to run Qwen1.5...
Llamaindex-ts example on CPU and Intel GPU * Agent * RAG
Qwen1.5-7B 8K输入下会OOM ,当修改qwen1.5\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py #logits = logits.float() 可以运行,但是memory降低很多,是否对模型其他方面有影响。 是否能优化这个模型的整体memory消耗.