RobinJing issues

Results 8 issues of


                                            RobinJing

starcoder2 optimization results

With the starcoder2-3B the 2nd+ token latency is not well-performed, do you have any ideas about it? Thanks!

user issue

Ollama Linux No Response Issue with IPEX-LLM

user issue

Minicpm-V-2.6 Llama.cpp and Ollama Support

Hi, Minicpm-V is a very popular model these days, although it has not been supported by the current version of llama.cpp and ollama, please refer to the following submission: https://github.com/ollama/ollama/issues/6417...

user issue

Minicpm-V-2.5 Llama.cpp and Ollama Support

user issue

A770 Performance Issue with INT4

**Describe the bug** B60 Performance Issue with INT4, use the latest b3 image with vllm. **How to reproduce** Start vLLM with 1/2/4 cards and 32B/70B model, you will find the...

user issue

multi-arc

B60 Cannot Use FP16 Precision with GLM4-32B-0414

**Describe the bug** B60 Cannot Use FP16 Precision with GLM4-32B-0414 **How to reproduce** Dtype float32, lowbit fp16, the issue occurs. **Screenshots** ![Image](https://github.com/user-attachments/assets/9d97c444-0976-484b-aa04-11225571a5cb)

user issue

multi-arc

B60 Cannot Use FP16 Precision with MOE

**Describe the bug** B60 Cannot Use FP16 Precision with MOE **How to reproduce** Load Qwen3-30B-A3B with dtype float16 and lowbit fp16, the issue occurs. **Screenshots** ![Image](https://github.com/user-attachments/assets/97c46adc-d712-49b7-98ed-135e2e4e6d9a)

user issue

multi-arc

MiniCPM-O cannot use TTS

**Describe the bug** After enabling TTS parameter, the program will crash with error ![Image](https://github.com/user-attachments/assets/d63fa444-5825-4959-8384-ed1f245dd9e2) **How to reproduce** Steps to reproduce the error: 1. intelanalytics/ipex-llm-inference-cpp-xpu:latest 2. Follow setup guidance on IPEX-LLM...

user issue