RobinJing

Results 8 issues of RobinJing

With the starcoder2-3B the 2nd+ token latency is not well-performed, do you have any ideas about it? Thanks!

user issue

OS: Linux Ubuntu 22.04 Kernel:5.13 显卡:A770 平台:RPL-P 在按照guide安装并启动ollama后,出现query没反应的情况,ollama侧也没有任何的打印。 | | | |Compute |Max compute|Max work|Max sub| | |ID| Device Type| Name|capability|units |group |group |Global mem size| | 0|[level_zero:gpu:0]| Intel(R) Arc(TM)...

user issue

Hi, Minicpm-V is a very popular model these days, although it has not been supported by the current version of llama.cpp and ollama, please refer to the following submission: https://github.com/ollama/ollama/issues/6417...

user issue

Hi, Minicpm-V is a very popular model these days, although it has not been supported by the current version of llama.cpp and ollama, please refer to the following submission: https://github.com/ollama/ollama/issues/6307...

user issue

**Describe the bug** B60 Performance Issue with INT4, use the latest b3 image with vllm. **How to reproduce** Start vLLM with 1/2/4 cards and 32B/70B model, you will find the...

user issue
multi-arc

**Describe the bug** B60 Cannot Use FP16 Precision with GLM4-32B-0414 **How to reproduce** Dtype float32, lowbit fp16, the issue occurs. **Screenshots** ![Image](https://github.com/user-attachments/assets/9d97c444-0976-484b-aa04-11225571a5cb)

user issue
multi-arc

**Describe the bug** B60 Cannot Use FP16 Precision with MOE **How to reproduce** Load Qwen3-30B-A3B with dtype float16 and lowbit fp16, the issue occurs. **Screenshots** ![Image](https://github.com/user-attachments/assets/97c46adc-d712-49b7-98ed-135e2e4e6d9a)

user issue
multi-arc

**Describe the bug** After enabling TTS parameter, the program will crash with error ![Image](https://github.com/user-attachments/assets/d63fa444-5825-4959-8384-ed1f245dd9e2) **How to reproduce** Steps to reproduce the error: 1. intelanalytics/ipex-llm-inference-cpp-xpu:latest 2. Follow setup guidance on IPEX-LLM...

user issue