Georgi Gerganov comments

Results 1015 comments of


                                            Georgi Gerganov

ggml : rewrite silu and softmax for cpu

Can we enforce a compile error if `-ffinite-math-only` is used during compilation in order to prevent such issues in the future?

ggml : rewrite silu and softmax for cpu

I was thinking a change in the source code rather - the build system is not standardised, so nothing prevents 3rd party projects from building with `-ffinite-math-only`. Maybe we can...

Unicode codepoint flags for custom regexs

Looks like the tokenizer tests are failing on Windows for some reason: https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583

avoid to get prompt in infill mode and embedding mode

The embedding CI seems to be failing

Added support for the ArcticForCausalLM.

I haven't tested as well, but it seems good so feel free to merge

Support for RecurrentGemma (Gemma with Griffin Architecture)

Will be added, though we probably have to merge Jamba (https://github.com/ggerganov/llama.cpp/pull/7531) and then see how to adapt `llama_cache` to support the new Griffin layers

RPC issues and comments

> I am guessing that RPC mode currently does not support mixed CPU and GPU offload, i.e. GPU offload only so if your models doesn't fit in the memory there...

Batched inference with greedy sampling yields different completions

This is an effect from using unified KV cache: https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227

Batched inference with greedy sampling yields different completions

No plan at the moment on my side. Haven't figure out a good way to implement this yet

Batched inference with greedy sampling yields different completions

This is not expected