Justine Tunney
Justine Tunney
Sorry I didn't notice this until now! Don't be shy to ping me at [email protected] if I ever fall behind. I'll review this as soon as possible. Promise :)
Ok I took a look finally. If we can maintain backwards compatibility with what I have installed on my system (Ubuntu 14.04) then I'm happy.
I'm so sorry, but when you get around to updating PR, you'll get a conflict rebasing this on master. I just migrated the project to GNU autotools. (That included moving...
If NASA engineers end up noticing a comment like that, from a scrappy little telecom library like ours, they'll probably take it as a complement!
I support this. Are you a Debian maintainer who would be able to be of assistance?
Try comparing `./mistral-7b-instruct-v0.2.Q5_K_M.llamafile --version` with `llamafile --version`. If they're the same, then they *should* behave identically. You can also use `unzip` to extract the gguf file from the llamafile and...
Ragel is only a dependency if you install from git. If you download the tarball, you don't need it. I'd be happy to accept spec file.
Is your program using `MAP_FIXED`? You can use `blink -s` to system call trace and find out. Chances are it's requesting fixed memory that overlaps with memory Android OS or...
@USBhost Unfortunately no. The K quants were designed to exploit under-utilization of CPU resources when doing matvecs. I tried copying and pasting the `Q5_K_M` code into a tinyBLAS 2-d block-tiling...
The tinyBLAS code upstreamed by Mozilla's llamafile project makes prompt processing go very fast for F32, F16, Q4\_0, and Q8\_0. | model | size | params | backend | threads...