Ruonan Wang comments

Results 100 comments of


                                            Ruonan Wang

Can Flashmoe run on an Intel shared GPU?

Hi @brownplayer , what is your OS and what is your target model ? Flashmoe only supports Linux for now.

Can Flashmoe run on an Intel shared GPU?

We may consider to support flashmoe for Windows later. But for now, I guess you could run qwen3-30b-3b on Windows with ipex-llm llama.cpp (maybe pip install or portable zip), you...

gemma3n unable to run ipex-llm

Hi @shailesh837 , gemma3n is supported from `ipex-llm[cpp]==2.3.0b20250630`. You could try it first with `pip install --pre --upgrade ipex-llm[cpp]` or waiting for new ollama portable zip, we will release it...

flash attention enabled but not supported by gpu

Hi @stereomato , I don't quite understand your problem, could you please provide us with detailed running log and the messages you mentioned ?

gemma3n unable to run ipex-llm

Hi @shailesh837 , new portable zip is uploaded here: https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip , and in this new portable zip, you can run with gemma3n.

flash attention enabled but not supported by gpu

Hi @stereomato , we do not have support for `OLLAMA_FLASH_ATTENTION` yet.

flash attention enabled but not supported by gpu

Hi @stereomato , if you want to use fp8 quantized kv cache, you could try `export IPEX_LLM_QUANTIZE_KV_CACHE=1` before `./ollama serve` . It might work for models running with llamarunner.

flash attention enabled but not supported by gpu

Take `./ollama run qwen3` for exmaple, original kv output looks like: ```bash llama_kv_cache_unified: kv_size = 4096, type_k = 'f16', type_v = 'f16', n_layer = 36, can_shift = 1, padding =...

flash attention enabled but not supported by gpu

Yeah I just took qwen3 for an example. Actually for such models with grouped query attention, quantized kv does not bring obvious benefit .

gemma3n unable to run ipex-llm

> what is the more commanded version, the nightly release or the docker container ? I tried `gemma3n` with the latest docker container and it doesn't work Hi @FilipLaurentiu ,...