RobinJing
RobinJing
With the starcoder2-3B the 2nd+ token latency is not well-performed, do you have any ideas about it? Thanks!
OS: Linux Ubuntu 22.04 Kernel:5.13 显卡:A770 平台:RPL-P 在按照guide安装并启动ollama后,出现query没反应的情况,ollama侧也没有任何的打印。 | | | |Compute |Max compute|Max work|Max sub| | |ID| Device Type| Name|capability|units |group |group |Global mem size| | 0|[level_zero:gpu:0]| Intel(R) Arc(TM)...
Hi, Minicpm-V is a very popular model these days, although it has not been supported by the current version of llama.cpp and ollama, please refer to the following submission: https://github.com/ollama/ollama/issues/6417...
Hi, Minicpm-V is a very popular model these days, although it has not been supported by the current version of llama.cpp and ollama, please refer to the following submission: https://github.com/ollama/ollama/issues/6307...
**Describe the bug** B60 Performance Issue with INT4, use the latest b3 image with vllm. **How to reproduce** Start vLLM with 1/2/4 cards and 32B/70B model, you will find the...
**Describe the bug** B60 Cannot Use FP16 Precision with GLM4-32B-0414 **How to reproduce** Dtype float32, lowbit fp16, the issue occurs. **Screenshots** 
**Describe the bug** B60 Cannot Use FP16 Precision with MOE **How to reproduce** Load Qwen3-30B-A3B with dtype float16 and lowbit fp16, the issue occurs. **Screenshots** 
**Describe the bug** After enabling TTS parameter, the program will crash with error  **How to reproduce** Steps to reproduce the error: 1. intelanalytics/ipex-llm-inference-cpp-xpu:latest 2. Follow setup guidance on IPEX-LLM...