Ruonan Wang
Ruonan Wang
Hi @brownplayer , ipex-llm‘s ollama is upgrade to 0.3.6 with `ipex-llm[cpp]>=2.2.0b20240827`, you may have a try with latest llama.cpp / ollama 😊
Hi @hvico , could you please also provide us with your detail cmd so that we can try to reproduce it ?
Hi all, gemma3 ollama is supported from `ipex-llm[cpp]==2.3.0b20250529` . You could try it again with `pip install ipex-llm[cpp]==2.3.0b20250529`.
Yes, I can get the similar result with a standalone bmm op. I found that if just loop bmm with same input, then fp16 is much faster than fp32. However,...
@jgong5 Thanks for the reply! I have updated test script based on your comment, now I remove the rand time and added warmup. Now the time of aten::bmm seems almost...