Sunija

Results 6 comments of Sunija
trafficstars

👋 **TL;DR: CLBlast might not be faster.** If there is a way to increase prompt evaluation on this system, I'd be *highly* interested. I got a similar setup, though I...

> TBH you should test a Vulkan backend like mlc-llm. There isn't really a good way to leverage UHD 620 in llama.cpp yet, especially with max context prompts like that....

> The OpenCL code in llama.cpp can run 4-bit generation on the GPU now, too, but it requires the model to be loaded to VRAM, which integrated GPUs don't have...

I'm mostly wondering is: A) Is it physically impossible to increase the speed by using the GPU or... B) ...is this just a software issue, because the current libraries don't...

You cannot, atm. But it will come as a feature in one of the next versions.

Sadly not in the current implementation. :( Code could be adapted to allow it.