Georgi Gerganov comments

Results 420 comments of


Georgi Gerganov

Can't disable gpu

Currently, there is no way to disable the GPU completely when the project is built with OpenCL support. Will think about fixing this. In the meantime, does the information from...

You can easily update `ggml.c` to avoid all GPU calls (CUDA, OpenCL, etc.) if a global flag is set. For example here: https://github.com/ggerganov/whisper.cpp/blob/1f50a7d29f85f221368e81201780e0c8dd631076/ggml.c#L9816-L9825 You can add a `void ggml_gpu_set(bool enable);`...

Can't disable gpu

I'll probably make a new one soon, yes

RMSE-optimized quants for all quantization types

@ikawrakow Just made a full cuBLAS run on 13B using `Q4_3`, without RMSE optimization and `output` in F16 precision and got: `5.3075` ``` main: seed = 1682170268 llama.cpp: loading model...

RMSE-optimized quants for all quantization types

My result for 13B, using `Q4_3` with RMSE opt. + F16 output is: `5.2962` This result I think makes more sense since it is inline with my expectation that I...

RMSE-optimized quants for all quantization types

> @ggerganov Are these results with or without the changes you made to `Q4_3` after I opened this PR (and reported the results)? It includes all changes from today related...

RMSE-optimized quants for all quantization types

I think we cannot expect cuBLAS and OpenBLAS to be exactly the same because cuBLAS dequantizes `x` to F16 and casts `y` to F16 and performs F16 mat mul, while...

llama : speed-up grammar sampling

@ExtReMLapin This copy is used only in the `speculative` example. Even if it helps there, it won't have any effect on the general use case. Still, a PR is welcome...

llama : speed-up grammar sampling

> if I were to make improvements to the grammar engine, would those speed improvements show up in our current bank of benchmarks? We don't have benchmarks for this yet....

stream : output only completed lines of text in saved file.

This breaks the "real-time" stream usage. For example see the videos here: https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.nvim