Performance - heads up
Just a heads up, given it's more than a week since last release. I'm deep in a complete overhaul of a series of behavior and functions. The core focus is to increase performance significantly, that includes a lot of rework in the cuda code, many new kernels, GPU offloading and memory management changes, GPU operation changes. Secondary I'm working on restructuring the application to be more flexible for the future.
I'm sitting on hundreds of smaller to bigger changes, that all takes it's toll on completing it timely. So a bit patience will be needed.
Thanks for the update :) Sometimes it's not easy to split it in convenient handy small PR's
I wanted to release at least a test branch before I'm on vacation but doesn't look like I'll get to the state. So there will be 2 weeks delay now :(
I guess I started too much simultaneously. New scheduling in ggml, new kernels for 16 and 8 bit processing cuda, a ton of changes, fixes and improvements and steps to offload full on gpu with broadcasting support. But all is in half finished state. Some designs of the ggml-cuda are not ideal for what we need, that stuff already took days working through (or trying working around).
Update: As vacations can do, a lot of stuff was postponed. So ggllm development is slowed. I'm still on it but not at the pace until I've finished my backlog