Lockinwize Lolite comments

Results 34 comments of


                                            Lockinwize Lolite

Flash attention abort trap

> GGML_ASSERT(P >= 0) I encountered that too. > The solution described above (wrapping the GGML_ASSERT(P >= 0); in that line of code in an if) works to fix the...

Flash attention abort trap

> I also see very bad quality with flash attention, but that is just because Flash attention is not following the prompt at all. That is a bug unrelated to...

> @mzwing Would my latest commit [ggerganov/llama.cpp@a76fbcd](https://github.com/ggerganov/llama.cpp/commit/a76fbcd05054e39e8be325c10320397775d42ac3) that tidy the modification outside the minicpmv folder change the gguf file? Not tested yet ( I will update the Huggingface repo tonight...

支持llama.cpp 部署么？

@leeaction Are you sure that your ollama supports MiniCPM-V-2? This model may need manual compilation with [the PR](https://github.com/ggerganov/llama.cpp/pull/6919) adding in it.

支持llama.cpp 部署么？

Yes my quantized models were built with the PR. > OR.... I should to compile ollama bin with the PR locally and use it? Yes, you should.

支持llama.cpp 部署么？

> I will update the Huggingface repo tonight if the gguf files are changed. @Achazwl @leeaction I confirm that the gguf files are changed. Uploading... Please wait for a while.

支持llama.cpp 部署么？

> Uploading... Please wait for a while. Uploaded. Please test. @leeaction

支持llama.cpp 部署么？

> Can you please upload Q5_K_M if you have time? This has a good tradeoff between quality and size ~~I will have a try at noon (UTC+8).~~ I encountered some...

支持llama.cpp 部署么？

@huisai 你启用了CLBLast加速没有（这个时间绝对是不正常的，除非你手机正在后台运行一些高占用进程。以及你用的模型量化是多少啊（

支持llama.cpp 部署么？

> 还是不支持部署吗？看到这个写的还在开发过程中，https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/server 根据 https://github.com/ggerganov/llama.cpp/pull/7599 ，开发应该已接近完成。而且其实在开发过程中也是可以部署的:) 只是也许需要自己编译一下（