ipex-llm Unable to use GLM model

Describe the bug Error occurs when using the following GLM model https://www.modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf https://www.modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf

Screenshots

Error messages: llama runner process has terminated: error loading model: missing tensor 'blk.0.attn_qkv.weight' llama_load_model_from_file: failed to load model

llama_model_load: error loading model: missing tensor 'blk.0.attn_qkv.weight' llama_load_model_from_file: failed to load model panic: unable to load model: /root/.ollama/models/blobs/sha256-1d4816cb2da5ac2a5acfa7315049ac9826d52842df81ac567de64755986949fa

goroutine 20 [running]: ollama/llama/runner.(*Server).loadModel(0xc0004b2120, {0x3e7, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc000502dd0, 0x0}, ...) ollama/llama/runner/runner.go:861 +0x4ee created by ollama/llama/runner.Execute in goroutine 1 ollama/llama/runner/runner.go:1001 +0xd0d time=2025-03-26T11:22:38.876+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error loading model: missing tensor 'blk.0.attn_qkv.weight'"

Mar 26 '25 08:03 RonkyTang

Hi @RonkyTang, we are working on upgrading ipex-llm ollama into a new version, and these two GLM models could be supported then.

Mar 27 '25 07:03 sgwhat

Hi @RonkyTang, we are working on upgrading ipex-llm ollama into a new version, and these two GLM models could be supported then.

Thanks !

Mar 28 '25 08:03 RonkyTang

Hi, @sgwhat could you please share the schedule for the release? thanks!

Apr 01 '25 08:04 hli25

Hi, @sgwhat could you please share the schedule for the release? thanks!

I will release v0.6.x support in next week.

Apr 03 '25 05:04 sgwhat

Two issues were identified when using the gml-v-2b-gbuf (https://www.modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf ) model:

Long reasoning time
The returned content is all incorrect
If using the official version of Ollama, everything is normal

Apr 14 '25 07:04 RonkyTang

Hi @RonkyTang, I have found out the reason, and it will be fixed in tmr's version.

Apr 14 '25 07:04 sgwhat

Hi @RonkyTang, I have found out the reason, and it will be fixed in tmr's version.

Thanks!

Apr 15 '25 01:04 RonkyTang

Hi @sgwhat , once your fixing is ready, please drop us a message then we can have a try, thanks! cc @RonkyTang

Apr 15 '25 07:04 hli25

Hi @RonkyTang , I am still working on fixing running this model's clip part on sycl backend. I will come back to you when this issue been fixed after a few days.

Apr 15 '25 10:04 sgwhat

Hi @sgwhat , Can you talk about the current progress? thank you

Apr 18 '25 08:04 RonkyTang

Hi @RonkyTang, we have released the new version of ollama in https://github.com/intel/ipex-llm/releases/tag/v2.3.0-nightly. We have optimized clip model to run on gpu on windows.

Apr 18 '25 08:04 sgwhat

Hi @RonkyTang, we have released the new version of ollama in https://github.com/intel/ipex-llm/releases/tag/v2.3.0-nightly. We have optimized clip model to run on gpu on windows.

Hi @sgwhat , thank you for your reply. But there is still a problem, the loading of multimodal models takes a few minutes:

Apr 18 '25 09:04 RonkyTang

Hi @RonkyTang, Seems on ubuntu, clip still be forced running on cpu (it works well with a great perf on windows), this has been fixed and I will release the fixed version tmr.

Apr 21 '25 07:04 sgwhat

Hi @RonkyTang, we have released the optimized version on ubuntu, which could run the clip model on GPU. You may install it via pip install --pre --upgrade ipex-llm[cpp]

Apr 22 '25 01:04 sgwhat

Hi @RonkyTang, we have released the optimized version on ubuntu, which could run the clip model on GPU. You may install it via pip install --pre --upgrade ipex-llm[cpp]

Hi @sgwhat , so you mean we need install an ipex-llm env for the runtime device?

Apr 22 '25 02:04 RonkyTang

Yes, in the conda env. You may refer to this installation guide.

Apr 22 '25 05:04 sgwhat

Hi @sgwhat , the PreView version has a problem,we can't to use iGPU, :

but the release version can to used:

Apr 25 '25 02:04 RonkyTang

This is expected behavior — Ollama does not utilize the iGPU until a model is loaded, at which point you will see VRAM usage increase. As for the confusing log message, I will remove it later. @RonkyTang

Apr 25 '25 02:04 sgwhat

This is expected behavior — Ollama does not utilize the iGPU until a model is loaded, at which point you will see VRAM usage increase. As for the confusing log message, I will remove it later. @RonkyTang

So, do you mean the preview version used iGPU?

Apr 25 '25 02:04 RonkyTang

So, do you mean the preview version used iGPU?

Yes, you may load a model to check.

Apr 25 '25 02:04 sgwhat

ok ,I hope it's just a log printing error

Apr 25 '25 02:04 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

Apr 25 '25 05:04 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

Hi @sgwhat , And we has found another problem: when used ipex-ollama version, continuous memory usage of 17%(model is glm 1.5b):

but we used public ollama version, memory only used 4~5%(model also is glm 1.5b):

Apr 30 '25 05:04 RonkyTang

Hi @RonkyTang , we have release a new ollama version https://www.modelscope.cn/models/Intel/ollama .

Apr 30 '25 06:04 sgwhat

Hi @RonkyTang , we have release a new ollama version https://www.modelscope.cn/models/Intel/ollama .

Hi @sgwhat Thank you for the updated. But it still has memory issues.

Apr 30 '25 07:04 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

Hi @sgwhat , And we has found another problem: when used ipex-ollama version, continuous memory usage of 17%(model is glm 1.5b):

but we used public ollama version, memory only used 4~5%(model also is glm 1.5b):

Hi　@sgwhat ，How about this?

May 09 '25 02:05 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

Hi @sgwhat , And we has found another problem: when used ipex-ollama version, continuous memory usage of 17%(model is glm 1.5b): but we used public ollama version, memory only used 4~5%(model also is glm 1.5b):

Hi　@sgwhat ，How about this?

Hi @sgwhat , How about this?

May 16 '25 09:05 RonkyTang

@sgwhat any comment on this issue?

@RonkyTang could you please check which ollama process cause more memory? you can use "top" then press "M" to sort them by memory usage. At the same time, you could run "free -h" to check if the memory is allocated for "buff/cache"

May 20 '25 04:05 hli25

Hi @RonkyTang , I apologize for the late reply. The memory usage depends on many factors, including different values of num_parallel and num_ctx. You can try adjusting these parameters to check. Additionally, we’ve just released the latest version of Ollama, you may try running this version and share the actual memory usage with me.

May 20 '25 07:05 sgwhat

Hi @RonkyTang , I apologize for the late reply. The memory usage depends on many factors, including different values of num_parallel and num_ctx. You can try adjusting these parameters to check. Additionally, we’ve just released the latest version of Ollama, you may try running this version and share the actual memory usage with me.

Hi @sgwhat , the problem is fixed at new version. thanks for your help. And please support our other issue: https://github.com/intel/ipex-llm/issues/13192

May 27 '25 03:05 RonkyTang