ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Unable to use GLM model

Open RonkyTang opened this issue 9 months ago • 30 comments

Describe the bug Error occurs when using the following GLM model https://www.modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf https://www.modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf

Screenshots Image

Error messages: llama runner process has terminated: error loading model: missing tensor 'blk.0.attn_qkv.weight' llama_load_model_from_file: failed to load model

llama_model_load: error loading model: missing tensor 'blk.0.attn_qkv.weight' llama_load_model_from_file: failed to load model panic: unable to load model: /root/.ollama/models/blobs/sha256-1d4816cb2da5ac2a5acfa7315049ac9826d52842df81ac567de64755986949fa

goroutine 20 [running]: ollama/llama/runner.(*Server).loadModel(0xc0004b2120, {0x3e7, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc000502dd0, 0x0}, ...) ollama/llama/runner/runner.go:861 +0x4ee created by ollama/llama/runner.Execute in goroutine 1 ollama/llama/runner/runner.go:1001 +0xd0d time=2025-03-26T11:22:38.876+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error loading model: missing tensor 'blk.0.attn_qkv.weight'"

RonkyTang avatar Mar 26 '25 08:03 RonkyTang

Hi @RonkyTang, we are working on upgrading ipex-llm ollama into a new version, and these two GLM models could be supported then.

sgwhat avatar Mar 27 '25 07:03 sgwhat

Hi @RonkyTang, we are working on upgrading ipex-llm ollama into a new version, and these two GLM models could be supported then.

Thanks !

RonkyTang avatar Mar 28 '25 08:03 RonkyTang

Hi, @sgwhat could you please share the schedule for the release? thanks!

hli25 avatar Apr 01 '25 08:04 hli25

Hi, @sgwhat could you please share the schedule for the release? thanks!

I will release v0.6.x support in next week.

sgwhat avatar Apr 03 '25 05:04 sgwhat

Image

Two issues were identified when using the gml-v-2b-gbuf (https://www.modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf ) model:

  1. Long reasoning time
  2. The returned content is all incorrect
  3. If using the official version of Ollama, everything is normal

RonkyTang avatar Apr 14 '25 07:04 RonkyTang

Hi @RonkyTang, I have found out the reason, and it will be fixed in tmr's version.

sgwhat avatar Apr 14 '25 07:04 sgwhat

Hi @RonkyTang, I have found out the reason, and it will be fixed in tmr's version.

Thanks!

RonkyTang avatar Apr 15 '25 01:04 RonkyTang

Hi @sgwhat , once your fixing is ready, please drop us a message then we can have a try, thanks! cc @RonkyTang

hli25 avatar Apr 15 '25 07:04 hli25

Hi @RonkyTang , I am still working on fixing running this model's clip part on sycl backend. I will come back to you when this issue been fixed after a few days.

sgwhat avatar Apr 15 '25 10:04 sgwhat

Hi @sgwhat , Can you talk about the current progress? thank you

RonkyTang avatar Apr 18 '25 08:04 RonkyTang

Hi @RonkyTang, we have released the new version of ollama in https://github.com/intel/ipex-llm/releases/tag/v2.3.0-nightly. We have optimized clip model to run on gpu on windows.

sgwhat avatar Apr 18 '25 08:04 sgwhat

Hi @RonkyTang, we have released the new version of ollama in https://github.com/intel/ipex-llm/releases/tag/v2.3.0-nightly. We have optimized clip model to run on gpu on windows.

Hi @sgwhat , thank you for your reply. But there is still a problem, the loading of multimodal models takes a few minutes:

Image

RonkyTang avatar Apr 18 '25 09:04 RonkyTang

Hi @RonkyTang, Seems on ubuntu, clip still be forced running on cpu (it works well with a great perf on windows), this has been fixed and I will release the fixed version tmr.

sgwhat avatar Apr 21 '25 07:04 sgwhat

Hi @RonkyTang, we have released the optimized version on ubuntu, which could run the clip model on GPU. You may install it via pip install --pre --upgrade ipex-llm[cpp]

sgwhat avatar Apr 22 '25 01:04 sgwhat

Hi @RonkyTang, we have released the optimized version on ubuntu, which could run the clip model on GPU. You may install it via pip install --pre --upgrade ipex-llm[cpp]

Hi @sgwhat , so you mean we need install an ipex-llm env for the runtime device?

RonkyTang avatar Apr 22 '25 02:04 RonkyTang

Yes, in the conda env. You may refer to this installation guide.

sgwhat avatar Apr 22 '25 05:04 sgwhat

Hi @sgwhat , the PreView version has a problem,we can't to use iGPU, :

Image

but the release version can to used:

Image

RonkyTang avatar Apr 25 '25 02:04 RonkyTang

This is expected behavior — Ollama does not utilize the iGPU until a model is loaded, at which point you will see VRAM usage increase. As for the confusing log message, I will remove it later. @RonkyTang

sgwhat avatar Apr 25 '25 02:04 sgwhat

This is expected behavior — Ollama does not utilize the iGPU until a model is loaded, at which point you will see VRAM usage increase. As for the confusing log message, I will remove it later. @RonkyTang

So, do you mean the preview version used iGPU?

RonkyTang avatar Apr 25 '25 02:04 RonkyTang

So, do you mean the preview version used iGPU?

Yes, you may load a model to check.

sgwhat avatar Apr 25 '25 02:04 sgwhat

ok ,I hope it's just a log printing error Image

RonkyTang avatar Apr 25 '25 02:04 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

RonkyTang avatar Apr 25 '25 05:04 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

Hi @sgwhat , And we has found another problem: when used ipex-ollama version, continuous memory usage of 17%(model is glm 1.5b): Image

but we used public ollama version, memory only used 4~5%(model also is glm 1.5b): Image

RonkyTang avatar Apr 30 '25 05:04 RonkyTang

Hi @RonkyTang , we have release a new ollama version https://www.modelscope.cn/models/Intel/ollama .

sgwhat avatar Apr 30 '25 06:04 sgwhat

Hi @RonkyTang , we have release a new ollama version https://www.modelscope.cn/models/Intel/ollama .

Hi @sgwhat Thank you for the updated. But it still has memory issues.

RonkyTang avatar Apr 30 '25 07:04 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

Hi @sgwhat , And we has found another problem: when used ipex-ollama version, continuous memory usage of 17%(model is glm 1.5b): Image

but we used public ollama version, memory only used 4~5%(model also is glm 1.5b): Image

Hi @sgwhat ,How about this?

RonkyTang avatar May 09 '25 02:05 RonkyTang

Hi @sgwhat how to make a like ollama portable package? And i copied all the libraries that ollama bin depends on to the ollama-bin directory and set environment variables, but the model cannot be used properly

Hi @sgwhat , And we has found another problem: when used ipex-ollama version, continuous memory usage of 17%(model is glm 1.5b): Image but we used public ollama version, memory only used 4~5%(model also is glm 1.5b): Image

Hi @sgwhat ,How about this?

Hi @sgwhat , How about this?

RonkyTang avatar May 16 '25 09:05 RonkyTang

@sgwhat any comment on this issue?

@RonkyTang could you please check which ollama process cause more memory? you can use "top" then press "M" to sort them by memory usage. At the same time, you could run "free -h" to check if the memory is allocated for "buff/cache"

hli25 avatar May 20 '25 04:05 hli25

Hi @RonkyTang , I apologize for the late reply. The memory usage depends on many factors, including different values of num_parallel and num_ctx. You can try adjusting these parameters to check. Additionally, we’ve just released the latest version of Ollama, you may try running this version and share the actual memory usage with me.

sgwhat avatar May 20 '25 07:05 sgwhat

Hi @RonkyTang , I apologize for the late reply. The memory usage depends on many factors, including different values of num_parallel and num_ctx. You can try adjusting these parameters to check. Additionally, we’ve just released the latest version of Ollama, you may try running this version and share the actual memory usage with me.

Hi @sgwhat , the problem is fixed at new version. thanks for your help. And please support our other issue: https://github.com/intel/ipex-llm/issues/13192

RonkyTang avatar May 27 '25 03:05 RonkyTang