Ruonan Wang comments

Results 55 comments of


                                            Ruonan Wang

An error occurred while running mistral-nemo in ollama

Hi @brownplayer , ipex-llm‘s ollama is upgrade to 0.3.6 with `ipex-llm[cpp]>=2.2.0b20240827`, you may have a try with latest llama.cpp / ollama 😊

ipex Llama.cpp server fails with Phi3 models

Hi @hvico , could you please also provide us with your detail cmd so that we can try to reproduce it ?

gemma3 and qwen3

Hi all, gemma3 ollama is supported from `ipex-llm[cpp]==2.3.0b20250529` . You could try it again with `pip install ipex-llm[cpp]==2.3.0b20250529`.

aten::bmm op is much more slower in float16 for llm rest token generation

Yes, I can get the similar result with a standalone bmm op. I found that if just loop bmm with same input, then fp16 is much faster than fp32. However,...

aten::bmm op is much more slower in float16 for llm rest token generation

@jgong5 Thanks for the reply! I have updated test script based on your comment, now I remove the rand time and added warmup. Now the time of aten::bmm seems almost...