Kerfuffle

Results 159 comments of Kerfuffle

In addition to the previous points, if you have a good CPU and a relatively weak GPU then offloading to the GPU can still be slower even if a model...

Unfortunately, this was designed before those context changes. Also one of my main goals was to try to avoid running into GGML asserts that would bring the whole process down,...

By "dependent type" you mean the const generics dimensions stuff? I don't disagree with that, but there are unfortunately a bunch of other issues as well. I'm not too sure...

#2813 - still need to implement the non-tricky version. Related, there's #2969 - also should be a 50% memory use reduction.

A fair point. Keep in mind this is super early in development and it's very likely I will make changes that break all existing code. So the project currently probably...

> Their length is stored before the key. They have a maximal length of 256 255 makes more sense if you're going to use a byte to store the length....

> Considering that the safetensors project already answers the question You'd have to fork it to do that, they don't seem interested in extending it. Based on existing discussion, it...

I agree with all of the above, but that's basically what I'd call "forking" it. Taking that project and basing another one on it that takes a different approach, has...

Extremely unlikely that's the problem. These models aren't ChatGPT, which has had extensive training to chat. Also, the interface is doing fancy stuff behind the scenes and fixing up the...

It works exactly the same as `-p` except it lets the prompt get read from a file rather than a string. So I'm not quite sure what you mean. If...