Georgi Gerganov
Georgi Gerganov
Here is a very quick and dirty implementation using `ggml`: https://github.com/ggerganov/ggml/pull/96 Also, found a bug in multi-threaded `ggml_cpy()`: https://github.com/ggerganov/ggml/pull/96/files#diff-b4a500ab2765c31526c5541f3e51e21e46990b87d9774cac6f3089db315bdc5bR5655-R5660
Merged in `ggml`: https://github.com/ggerganov/ggml/tree/master/examples/stablelm
There seems to be a bug in the existing StableLM implementation in `ggml`. See the updated README for more details: https://github.com/ggerganov/ggml/tree/master/examples/stablelm#warning Best way to fix this is to compare outputs...
So, I ran the HF transformers implementation and I observe the same "increasing magnitude" behaviour as in the `ggml` implementation. To do this, I changed the following line: https://github.com/huggingface/transformers/blob/c2c99dc7ef5edab8f7674a1eb00cf6ac6996fd0f/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L234 to:...
> is it possible this is normal? Absolutely. It's just my intuitive understanding that the scaling before the soft max layer has the purpose of preventing exactly this kind of...
I had a quick glance at the GPTQ paper yesterday, but haven't dug into details yet. Do you think it is possible to demonstrate a simple routine for performing quantization...
@mudler Looks great! If you wish to add it to this project, please see how we organized the Go bindings in the [whisper.cpp](https://github.com/ggerganov/whisper.cpp) repo and provide basic CI scripts together...
ggml large is equal to large-v2
Hi! Whisper is the original speech recognition model created and released by OpenAI. It is implemented in Python and supports running both on the CPU and on the GPU. whisper.cpp...
@dkryaklin The color coding logic cannot be part of the `whisper.cpp` library. It has to stay in the user code. The idea is for the user to choose whatever coloring...