Georgi Gerganov

Results 1015 comments of Georgi Gerganov

Here is a very quick and dirty implementation using `ggml`: https://github.com/ggerganov/ggml/pull/96 Also, found a bug in multi-threaded `ggml_cpy()`: https://github.com/ggerganov/ggml/pull/96/files#diff-b4a500ab2765c31526c5541f3e51e21e46990b87d9774cac6f3089db315bdc5bR5655-R5660

Merged in `ggml`: https://github.com/ggerganov/ggml/tree/master/examples/stablelm

There seems to be a bug in the existing StableLM implementation in `ggml`. See the updated README for more details: https://github.com/ggerganov/ggml/tree/master/examples/stablelm#warning Best way to fix this is to compare outputs...

So, I ran the HF transformers implementation and I observe the same "increasing magnitude" behaviour as in the `ggml` implementation. To do this, I changed the following line: https://github.com/huggingface/transformers/blob/c2c99dc7ef5edab8f7674a1eb00cf6ac6996fd0f/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L234 to:...

> is it possible this is normal? Absolutely. It's just my intuitive understanding that the scaling before the soft max layer has the purpose of preventing exactly this kind of...

I had a quick glance at the GPTQ paper yesterday, but haven't dug into details yet. Do you think it is possible to demonstrate a simple routine for performing quantization...

@mudler Looks great! If you wish to add it to this project, please see how we organized the Go bindings in the [whisper.cpp](https://github.com/ggerganov/whisper.cpp) repo and provide basic CI scripts together...

ggml large is equal to large-v2

Hi! Whisper is the original speech recognition model created and released by OpenAI. It is implemented in Python and supports running both on the CPU and on the GPU. whisper.cpp...

@dkryaklin The color coding logic cannot be part of the `whisper.cpp` library. It has to stay in the user code. The idea is for the user to choose whatever coloring...