Kerfuffle comments

Results 159 comments of


                                            Kerfuffle

Using gpu is slower as not using it

In addition to the previous points, if you have a good CPU and a relatively weak GPU then offloading to the GPU can still be slower even if a model...

api design problem

Unfortunately, this was designed before those context changes. Also one of my main goals was to try to avoid running into GGML asserts that would bring the whole process down,...

api design problem

By "dependent type" you mean the const generics dimensions stuff? I don't disagree with that, but there are unfortunately a bunch of other issues as well. I'm not too sure...

Investigate PagedAttention KV-cache memory management for faster inference

#2813 - still need to implement the non-tricky version. Related, there's #2969 - also should be a 50% memory use reduction.

Some examples on how to use this?

A fair point. Keep in mind this is super early in development and it's very likely I will make changes that break all existing code. So the project currently probably...

ggml : unified file format

> Their length is stored before the key. They have a maximal length of 256 255 makes more sense if you're going to use a byte to store the length....

ggml : unified file format

> Considering that the safetensors project already answers the question You'd have to fork it to do that, they don't seem interested in extending it. Based on existing discussion, it...

ggml : unified file format

I agree with all of the above, but that's basically what I'd call "forking" it. Taking that project and basing another one on it that takes a different approach, has...

-f option seems to not work

Extremely unlikely that's the problem. These models aren't ChatGPT, which has had extensive training to chat. Also, the interface is doing fancy stuff behind the scenes and fixing up the...

-f option seems to not work

It works exactly the same as `-p` except it lets the prompt get read from a file rather than a string. So I'm not quite sure what you mean. If...