embeddedllm
embeddedllm copied to clipboard
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
# Add NPU Engine #28 Reference : [Phi-3 Cookbook - Intel NPU acceleration library](https://github.com/microsoft/Phi-3CookBook/blob/main/md%2F03.Inference%2FAIPC_Inference.md#running-phi-3-with-intel-npu-acceleration-library)  Update in: - `modelui.py` - `engine.py` - `setup.py` - `README.md` Add: - `npu_egnine.py` - `requirements-npu.txt`...
### Describe the bug The health status of chat api server cannot be queried when the chat api server is generating responses.
### Is your feature request related to a problem? Please describe. Based on Intel OpenVINO docs : [Run LLMs with OpenVINO GenAI Flavor on NPU](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html) Now that NPU is available...
### Is your feature request related to a problem? Please describe. Intel has its own npu runtime library Reference : [Phi-3 Cookbook - Intel NPU acceleration library](https://github.com/microsoft/Phi-3CookBook/blob/main/md%2F03.Inference%2FAIPC_Inference.md#running-phi-3-with-intel-npu-acceleration-library) ### Describe the...
# Benchmark Allow users to test on themselves to get the benchmark of model(s) on different backend. It will analyse the Token In / Out throughput for you in a...
Speech to Text and Text to Speech Model | Model | Model Link | | --- | --- | | Whisper | [link](https://huggingface.co/openai/whisper-tiny) | | Distil-Whisper | [link](https://huggingface.co/distil-whisper/distil-large-v2) | |...
Vision Langauge Model Support: | Model | Model Link | |--------|--------------| | Phi-3-vision | [link](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) | | Qwen-VL | [link](https://huggingface.co/Qwen/Qwen-VL-Chat) | | GLM-4V | [link](https://huggingface.co/THUDM/glm-4v-9b) | | LLaVA | -...
# Description This issue is related to https://github.com/microsoft/onnxruntime-genai/issues/570 . The fix is under development.