Ruonan Wang

Results 55 comments of Ruonan Wang

Hi @brownplayer , ipex-llm‘s ollama is upgrade to 0.3.6 with `ipex-llm[cpp]>=2.2.0b20240827`, you may have a try with latest llama.cpp / ollama 😊

Hi @hvico , could you please also provide us with your detail cmd so that we can try to reproduce it ?

Hi all, gemma3 ollama is supported from `ipex-llm[cpp]==2.3.0b20250529` . You could try it again with `pip install ipex-llm[cpp]==2.3.0b20250529`.

Yes, I can get the similar result with a standalone bmm op. I found that if just loop bmm with same input, then fp16 is much faster than fp32. However,...

@jgong5 Thanks for the reply! I have updated test script based on your comment, now I remove the rand time and added warmup. Now the time of aten::bmm seems almost...