Jason Dai comments

Results 106 comments of


                                            Jason Dai

stablelm fp8 kv cache

@MeouSker77 please take a look

Question about benchmark result

It uses iGPU; as mentioned in readme, please refer to [[2]](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-meta-llama3-with-intel-ai-solutions.html)[[3]](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-microsoft-phi-3-models-intel-ai-soln.html)[[4]](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-accelerate-alibaba-qwen2-llms.html) for more details.

Does IPEX-LLM support Flash Attention ?

> Hi @cyita , does that mean that [ollama with ipex-llm](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md?rgh-link-date=2024-09-24T14%3A53%3A15Z) _does_ support flash attention for supported models automatically? > > As far as I know, the ipex-llm backend does...

[documentation] "Quick Start" needs an actual Quick start

> I'm in the same boat. I recently bought a NUC14 with a Intel Arc Graphics (iGPU 7Xe/112EU/896SP, Xe-LPG / Gen 12.7) but I'm unable to use Intel OneAPI because...

gemma3 and qwen3

> Where is “ollama-ipex-llm-2.3.0b20250428-ubuntu.tgz“”？ See https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md

update to ollama 0.4.0

> So slow update, sad. Intel , are you OK? Our current version is consistent with [v0.4.6](https://github.com/ollama/ollama/releases/tag/v0.4.6) of ollama. See https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

Jason Dai

stablelm fp8 kv cache

Question about benchmark result

Does IPEX-LLM support Flash Attention ?

[documentation] "Quick Start" needs an actual Quick start

gemma3 and qwen3

update to ollama 0.4.0

Fix bug for embed function

Arc770 IPEX-LLM 的交互准确性问题

Add language switching

Support for gemma3 from google