clulece

Results 4 comments of clulece

> 4bit is twice as fast as 8bit because llama.cpp is efficient enough to be memory bound, not compute bound, even on modest processors. I have not seen comparisons of...

When this future is implemented, can you please also add a command line option to disable the navigation bars. This option is not critical but it would help give people...

> You should theoretically be able to run the same convert and quantize scripts on that model and use them with llama.cpp. I tried to convert the recreated weights using...

> Heya, do you mind laying out the steps you've done to get where you are now? I'm trying to do the same thing but can't get passed the initial...