embeddedllm icon indicating copy to clipboard operation
embeddedllm copied to clipboard

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

Results 8 embeddedllm issues
Sort by recently updated
recently updated
newest added

# Add NPU Engine #28 Reference : [Phi-3 Cookbook - Intel NPU acceleration library](https://github.com/microsoft/Phi-3CookBook/blob/main/md%2F03.Inference%2FAIPC_Inference.md#running-phi-3-with-intel-npu-acceleration-library) ![image](https://github.com/user-attachments/assets/382b73c0-8647-4665-b3c3-31eecbbdcf46) Update in: - `modelui.py` - `engine.py` - `setup.py` - `README.md` Add: - `npu_egnine.py` - `requirements-npu.txt`...

type: enhancement / feature

### Describe the bug The health status of chat api server cannot be queried when the chat api server is generating responses.

type: bug

### Is your feature request related to a problem? Please describe. Based on Intel OpenVINO docs : [Run LLMs with OpenVINO GenAI Flavor on NPU](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html) Now that NPU is available...

type: enhancement / feature

### Is your feature request related to a problem? Please describe. Intel has its own npu runtime library Reference : [Phi-3 Cookbook - Intel NPU acceleration library](https://github.com/microsoft/Phi-3CookBook/blob/main/md%2F03.Inference%2FAIPC_Inference.md#running-phi-3-with-intel-npu-acceleration-library) ### Describe the...

type: enhancement / feature

# Benchmark Allow users to test on themselves to get the benchmark of model(s) on different backend. It will analyse the Token In / Out throughput for you in a...

Speech to Text and Text to Speech Model | Model | Model Link | | --- | --- | | Whisper | [link](https://huggingface.co/openai/whisper-tiny) | | Distil-Whisper | [link](https://huggingface.co/distil-whisper/distil-large-v2) | |...

type: enhancement / feature

Vision Langauge Model Support: | Model | Model Link | |--------|--------------| | Phi-3-vision | [link](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) | | Qwen-VL | [link](https://huggingface.co/Qwen/Qwen-VL-Chat) | | GLM-4V | [link](https://huggingface.co/THUDM/glm-4v-9b) | | LLaVA | -...

type: enhancement / feature

# Description This issue is related to https://github.com/microsoft/onnxruntime-genai/issues/570 . The fix is under development.

type: bug