LongWriter icon indicating copy to clipboard operation
LongWriter copied to clipboard

是否在ollama中测试过速度?

Open allenxml opened this issue 1 year ago • 1 comments

在ollama中导入hugging face上4bit量化后的gguf格式模型,在openwebui中提问,输出速度很慢。 ollama主机4060ti 16g型号的显卡显存占用才8G,显卡核心频率经常在210,很少到最大频率,7950x的CPU占用率50%。

allenxml avatar Aug 23 '24 16:08 allenxml

I will provide a google translation so you don't have to.

Have you tested the speed in Ollama? #15

Import the 4-bit quantized gguf format model on hugging face in Ollama, and ask questions in openwebui. The output speed is very slow.

The video memory of the 4060ti 16g model of the Ollama host only occupies 8G, the core frequency of the graphics card is often at 210, and rarely reaches the maximum frequency. The CPU usage of 7950x is 50%.

kyuumeitai avatar Aug 30 '24 13:08 kyuumeitai