LostRuins Concedo comments

Results 48 comments of


                                            LostRuins Concedo

Add NVIDIA cuBLAS support

Just wondering for all those who have tried, how much speedup do you get in the batched **prompt eval timings** vs openblas (not perplexity calculations)? Would be good to benchmark...

> I would bring up CLBlast as it's been implemented over at https://github.com/LostRuins/koboldcpp/ and isn't Nvidia-exclusive, but from my experience, speed ups are minor or just ends up being slower...

Any chance of adding Clblast support?

@slaren @0cc4m we've solved the issue - apparently there was code in the llama.cpp file that made the graph switch to single threaded mode during BLAS calculations - understandable for...

CLBlast support

In case anyone is concerned - Occ4m is the main developer for the code relating to the CLBlast kernels and implementation, and we are fine with this code being merged...

Quantization does not write the quantization version to `ftype`

@philpax No, there are no issue with determining ftype in the file for me so far. Modulo for ftype is only required for ggml magic files (`0x67676d6c`), not ggjt (`0x67676a74`),...

Quantization does not write the quantization version to `ftype`

@philpax I'm personally more in the camp of - if it's not broken don't fix it - so given that the version+ftype multiplex was already added to existing ggml and...

Quantization does not write the quantization version to `ftype`

But it has to be consistent. Leaving the standard freely extensible but undefined can rapidly lead to fracturing formats as each implementation adds their own keys. That's how you get...

Output of quantized Vicuna is so inappropriate that I can't use it

Also this really isn't a llamacpp issue unless it's a tokenizer problem. You can confirm whether the input tokens match the vocab.

[User] GCC 7.5.0 compilation failure fix

If applying patch as a polyfill, it should be forwards compatible.

preemptive request : regarding possible bit shuffling sync (just in case)

Ah I get that. But I would say that *this repo* has sort of become *the* de-facto standard, as all implementations I know of are based off the code here....