embeddedllm issues

Add NPU Engine

3

# Add NPU Engine #28 Reference : [Phi-3 Cookbook - Intel NPU acceleration library](https://github.com/microsoft/Phi-3CookBook/blob/main/md%2F03.Inference%2FAIPC_Inference.md#running-phi-3-with-intel-npu-acceleration-library) ![image](https://github.com/user-attachments/assets/382b73c0-8647-4665-b3c3-31eecbbdcf46) Update in: - `modelui.py` - `engine.py` - `setup.py` - `README.md` Add: - `npu_egnine.py` - `requirements-npu.txt`...

szeyu

type: enhancement / feature

[BUG] The health status of chat api server cannot be queried when the chat api server is generating responses.

1

### Describe the bug The health status of chat api server cannot be queried when the chat api server is generating responses.

tjtanaa

type: bug

[FEAT] Support OpenVINO NPU device

### Is your feature request related to a problem? Please describe. Based on Intel OpenVINO docs : [Run LLMs with OpenVINO GenAI Flavor on NPU](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html) Now that NPU is available...

szeyu

type: enhancement / feature

[FEAT] Add NPU Engine

### Is your feature request related to a problem? Please describe. Intel has its own npu runtime library Reference : [Phi-3 Cookbook - Intel NPU acceleration library](https://github.com/microsoft/Phi-3CookBook/blob/main/md%2F03.Inference%2FAIPC_Inference.md#running-phi-3-with-intel-npu-acceleration-library) ### Describe the...

szeyu

type: enhancement / feature

Benchmark Code

# Benchmark Allow users to test on themselves to get the benchmark of model(s) on different backend. It will analyse the Token In / Out throughput for you in a...

szeyu

[FEAT] Support Whisper Model with IPEX-LLM

Speech to Text and Text to Speech Model | Model | Model Link | | --- | --- | | Whisper | [link](https://huggingface.co/openai/whisper-tiny) | | Distil-Whisper | [link](https://huggingface.co/distil-whisper/distil-large-v2) | |...

tjtanaa

type: enhancement / feature

[FEAT] Vision Language Model Support with IPEX-LLM

Vision Langauge Model Support: | Model | Model Link | |--------|--------------| | Phi-3-vision | [link](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) | | Qwen-VL | [link](https://huggingface.co/Qwen/Qwen-VL-Chat) | | GLM-4V | [link](https://huggingface.co/THUDM/glm-4v-9b) | | LLaVA | -...

tjtanaa

type: enhancement / feature

[BUG] Phi-3-Mini fails to execute on long prompts on iGPU with DirectML

# Description This issue is related to https://github.com/microsoft/onnxruntime-genai/issues/570 . The fix is under development.

tjtanaa

type: bug

embeddedllm
embeddedllm copied to clipboard

Metadata

Add NPU Engine

[BUG] The health status of chat api server cannot be queried when the chat api server is generating responses.

[FEAT] Support OpenVINO NPU device

[FEAT] Add NPU Engine

Benchmark Code

[FEAT] Support Whisper Model with IPEX-LLM

[FEAT] Vision Language Model Support with IPEX-LLM

[BUG] Phi-3-Mini fails to execute on long prompts on iGPU with DirectML

← Metadata

Owner

Metadata

embeddedllm embeddedllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

embeddedllm
embeddedllm copied to clipboard