GLM-4 ollama 加载 glm-4-9b-chat 胡言乱语

cuda: 12.6 transformer: 4.44.0 OS: win10 python: 3.11.4 ollama: 0.3.8 & 0.2.3 配置: RTX3090 12700kf

No response

download gguf model from https://www.modelscope.cn/models/llm-research/glm-4-9b-chat-gguf/files
ollama create xxx
ollama serve & open open-webui

只要我不点停就会一直写下去，没在别的model上发现过这种情况（gemma2-7b\ yi-9b），根据以往记录下了0.2.3的ollama但响应差不多

跑原模型时挺正常

Aug 30 '24 04:08 siegrainwong

https://github.com/THUDM/GLM-4/issues/323 https://github.com/THUDM/GLM-4/issues/333

Aug 30 '24 06:08 zhipuch

开过flash attention，不起作用

Aug 30 '24 07:08 siegrainwong

您好，ollama run glm4 下载的是那个模型呢？怎么指定下载glm-4-9b-chat这个版本呢

Dec 08 '24 06:12 gyhyfj

您好，ollama run glm4 下载的是那个模型呢？怎么指定下载glm-4-9b-chat这个版本呢

您可以事先下载glm-4-9b-chat到本地，或者ollama应该有地方可以设置下载模型的id

Dec 09 '24 10:12 sixsixcoder