Georgi Gerganov

Results 113 issues of Georgi Gerganov

WIP IN PROGRESS This branch is just for convenience to build the examples and sync with `whisper.cpp` changes. For more info and main development progress see https://github.com/ggerganov/ggml/pull/27 Quantised models (`-q4_0`...

See https://github.com/ggerganov/whisper.cpp/pull/474#issuecomment-1422103941

bug

Idea from: https://github.com/ggerganov/llama.cpp/issues/23#issuecomment-1465308592 We can add a `--cache_prompt` flag that if added will dump the computed KV caches of the prompt processing to the disk in a file with name...

enhancement
help wanted
good first issue
high priority
🦙.

close #5 #6 #24 We introduce efficient SIMD 4-bit integer quantisation running on the CPU First some initial results on M1 Pro: ### Language Models: | Model | Params |...

WIP in progress model: https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py tokenizer: https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/t5_tokenizer_model.py gated-gelu: https://arxiv.org/pdf/2002.05202.pdf google t5x: https://github.com/google-research/t5x / models: https://github.com/google-research/t5x/blob/main/docs/models.md FLAN-T5 Small layers ```python T5ForConditionalGeneration( (shared): Embedding(32128, 512) (encoder): T5Stack( (embed_tokens): Embedding(32128, 512) (block): ModuleList(...

We should probably make a logo for this project. Like an image of a 🦙 and some C++

good first issue
🦙.

It would be great to start doing this kind of quantitative analysis of `ggml`-based inference: https://bellard.org/ts_server/ It looks like Fabrice evaluates the models using something called LM Evaluation Harness: https://github.com/EleutherAI/lm-evaluation-harness...

enhancement
high priority
generation quality
research 🔬

Apply the changes from #252 to [convert-gptq-to-ggml.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-gptq-to-ggml.py) For more info about what this script does, see #301

help wanted
high priority

Currently, in [Q4_0](https://github.com/ggerganov/ggml/pull/27) quantization we choose the scaling factor for each 32 group of weights as `abs(max(x_i))/7`. It is easy to see that this is suboptimal. Consider quantization of the...

help wanted
good first issue
research 🔬

So I am looking at https://github.com/antimatter15/alpaca.cpp and I see they are already running 30B Alpaca models, while we are struggling to run 7B due to the recent tokenizer updates. I...

documentation
help wanted
good first issue
high priority
🦙.