leeaction

Results 18 comments of leeaction

gguf format is good for ollama users, Any Update?

any update? I'm Really looking forward to it

> Created a PR: [ggerganov/llama.cpp#6919](https://github.com/ggerganov/llama.cpp/pull/6919). I created a folder called "minicpmv" in the examples folder of llama.cpp. More detail can be seen in `llama.cpp/examples/minicpmv/README.md` Hi [Achazwl](https://github.com/Achazwl), Can you provide the...

> > > Created a PR: [ggerganov/llama.cpp#6919](https://github.com/ggerganov/llama.cpp/pull/6919). I created a folder called "minicpmv" in the examples folder of llama.cpp. More detail can be seen in `llama.cpp/examples/minicpmv/README.md` > > > >...

> @leeaction Do you sure that your ollama supports MiniCPM-V-2? This model may need manual compilation with [the PR](https://github.com/ggerganov/llama.cpp/pull/6919) adding in it. Emm..., I imported below gguf file to ollama....

> Yes my quantized models were built with the PR. > > > OR.... I should to compile ollama bin with the PR locally and use it? > > Yes,...

请问4bit量化后的模型InternVL-Chat-V1-5-4bit,大约最少需要多少显存?

有gguf文件还不行 还需要编译Ollama的backend llama.cpp

> create ollama modelfile like : > > FROM ./ggml-model-Q4_K_M.gguf TEMPLATE "{{ if .System }}system > > {{ .System }}{{ end }}{{ if .Prompt }}user > > {{ .Prompt }}{{...

啊? 你在V100上有这么快的速度吗? 我在v100上单步操作会用到40s左右 你是量化过吗?