SONG Ge comments

Results 164 comments of


                                            SONG Ge

http://localhost:11434 cannot be accessed

Hi @RICHES-2020, GPU devices will only be discovered after you loading a model. Try it via `ollama run xxx`.

Fail to run gemma3 in the IPEX Ollama released in 3.19

Could you try running `gemma3:4b` to confirm whether it's related to VRAM?

Fail to run gemma3 in the IPEX Ollama released in 3.19

Hi all, we are working on upgrading ipex-llm ollama version into 0.6.2 to re-support gemma3. Before that, you may run `gemma3:1b`. For more detailes, please see https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md.

Fail to run gemma3 in the IPEX Ollama released in 3.19

@chungyehwangai , we are working on upgrading ipex-llm ollama into v0.6.x, which could support `gemma3:4b/12b/27b`, we should be able to release it soon.

Fail to run gemma3 in the IPEX Ollama released in 3.19

Hi @tristan-k @chungyehwangai , I will release an initial version to support gemma3, maybe next Monday or Tuesday.

Fail to run gemma3 in the IPEX Ollama released in 3.19

> Does the [2.3.0-nightly build](https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly) add support for Gemma3? We have added support for gemma3-fp16.

Support for gemma3 from google

Hi All, you may install our latest version of ipex-llm ollama via `pip install --pre --upgrae ipex-llm[cpp]` to run gemma3 as below: 1. Run Ollama with GGUF Model on **ModelScope**...

Support for gemma3 from google

> [@sgwhat](https://github.com/sgwhat) No Ollama Portable Zip ? @ExplodingDragon @yizhangliu Releasing. You may see https://github.com/intel/ipex-llm/issues/12963#issuecomment-2731897924 to run it first.

Support for gemma3 from google

> After deployment, I asked a few questions about pictures, but the answers were incorrect. I used LM-Studio for deployment and there was no problem with answering picture questions. Hi...

Can The FlashMoe support in ipex-llm run on windows ?

Yeah we have already supported qwen3-30b-a3b moe model, it takes ~ 19GB memory. You may set `OLLAMA_SET_OT` before starting ollama server to offload only parts of moe layers to GPU,...