Georgi Gerganov issues

Results 113 issues of


Georgi Gerganov

4-bit Integer quantisation

WIP IN PROGRESS This branch is just for convenience to build the examples and sync with `whisper.cpp` changes. For more info and main development progress see https://github.com/ggerganov/ggml/pull/27 Quantised models (`-q4_0`...

Fix beam candidate assignment logic

See https://github.com/ggerganov/whisper.cpp/pull/474#issuecomment-1422103941

bug

Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs

Idea from: https://github.com/ggerganov/llama.cpp/issues/23#issuecomment-1465308592 We can add a `--cache_prompt` flag that if added will dump the computed KV caches of the prompt processing to the disk in a file with name...

enhancement

help wanted

good first issue

high priority

🦙.

4-bit Integer quantisation

close #5 #6 #24 We introduce efficient SIMD 4-bit integer quantisation running on the CPU First some initial results on M1 Pro: ### Language Models: | Model | Params |...

Add example for text-to-text transfer transformer (T5) inference

WIP in progress model: https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py tokenizer: https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/t5_tokenizer_model.py gated-gelu: https://arxiv.org/pdf/2002.05202.pdf google t5x: https://github.com/google-research/t5x / models: https://github.com/google-research/t5x/blob/main/docs/models.md FLAN-T5 Small layers ```python T5ForConditionalGeneration( (shared): Embedding(32128, 512) (encoder): T5Stack( (embed_tokens): Embedding(32128, 512) (block): ModuleList(...

Create a logo

We should probably make a logo for this project. Like an image of a 🦙 and some C++

good first issue

🦙.

Study how LM Evaluation Harness works and try to implement it

It would be great to start doing this kind of quantitative analysis of `ggml`-based inference: https://bellard.org/ts_server/ It looks like Fabrice evaluates the models using something called LM Evaluation Harness: https://github.com/EleutherAI/lm-evaluation-harness...

enhancement

high priority

generation quality

research 🔬

Update the convert-gptq-to-ggml.py with the new tokenizer output

Apply the changes from #252 to [convert-gptq-to-ggml.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-gptq-to-ggml.py) For more info about what this script does, see #301

help wanted

high priority

Investigate alternative approach for Q4 quantization

Currently, in [Q4_0](https://github.com/ggerganov/ggml/pull/27) quantization we choose the scaling factor for each 32 group of weights as `abs(max(x_i))/7`. It is easy to see that this is suboptimal. Consider quantization of the...

help wanted

good first issue

research 🔬

Add proper instructions for using Alpaca models

So I am looking at https://github.com/antimatter15/alpaca.cpp and I see they are already running 30B Alpaca models, while we are struggling to run 7B due to the recent tokenizer updates. I...

documentation

help wanted

good first issue

high priority

🦙.