ipex-llm
ipex-llm copied to clipboard
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Ma...
## Description Add qwen2 support for Pipeline-Parallel-FastAPI example.
## Description ### 1. Why the change? #11167 ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ] N/A - [...
batch 1, and 1024-512, it hung as below: THE MYSTERY OF THE CITY](9781441125608_epub_itb-ch5.xhtml) The man's journey took him to the heart of the city, where he discovered a hidden underground...
I'm running llama3 inference on a MTL Core Ultra 7 1003H iGPU on Ubuntu 2204. This link https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3 is followed and generate.py is used. The complete script is: source /opt/intel/oneapi/setvars.sh...
Team, Currently we using ubuntu server 22.04 and kernel is 5.15. Can provide which OneAPI version and GPU driver version work with latest IPEX framework? Thanks!
## Description Add Pipeline Parallel FastAPI Example QuickStart.