Georgi Gerganov comments

Results 420 comments of


Georgi Gerganov

Real-time streaming

Currently, the only way is to manually replace these strings yourself (for example, using regex). Btw, `-ac 768` is better than `-ac 750` - you want the number to be...

AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring

Before merging this: the current `Q4_3` format / implementation is not very efficient with ARM NEON: Time per token on M1 Pro: - `Q4_0` : `48ms` - `Q4_1` : `55ms`...

Cool stuff! Here is a sample run on M2 Ultra: ```bash $ ▶ ./sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" -t 12 [INFO] stable-diffusion.cpp:2191 - loading model from '../models/sd-v1-4-ggml-model-f16.bin' [INFO]...

OpenCL seems to almost work

Try this patch: https://github.com/ggerganov/llama.cpp/commit/6460f758dbd472653296044d36bed8c4554988f5

llamafile : improve sgemm.cpp

On `master` with `Accelerate` I get: ```bash make clean && LLAMA_NO_METAL=1 make -j && ./llama-bench -m models/mistral-7b-v0.2/ggml-model-fp16.gguf -m models/mistral-7b-v0.2/ggml-model-q8_0.gguf -m models/mistral-7b-v0.2/ggml-model-q4_0.gguf -ngl 0 -n 0 ``` | model | size...

llamafile : improve sgemm.cpp

Yes, this assert has to be avoided. The Command-R model has a very large output tensor and it's number of elements exceeds `int`. That's why in order to support it,...

llamafile : improve sgemm.cpp

Here are instruction to trigger this assert: - clone https://huggingface.co/CohereForAI/c4ai-command-r-plus ```bash # convert to GGUF python3 convert-hf-to-gguf.py ~/Data/huggingface/c4ai-command-r-plus/ --outfile models/command-r-plus/ggml-model-f16.gguf --outtype f16 # quantize to Q8_0 + F16 token embeddings...

Georgi Gerganov

Real-time streaming

AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring

First impressions info dump

OpenCL seems to almost work

llamafile : improve sgemm.cpp

llamafile : improve sgemm.cpp

llamafile : improve sgemm.cpp

Invalid model error : too old, regenerate your model files!

Invalid model error : too old, regenerate your model files!

suggestion: implement jsonformer for generating JSON