Georgi Gerganov
Georgi Gerganov
Can we enforce a compile error if `-ffinite-math-only` is used during compilation in order to prevent such issues in the future?
I was thinking a change in the source code rather - the build system is not standardised, so nothing prevents 3rd party projects from building with `-ffinite-math-only`. Maybe we can...
Looks like the tokenizer tests are failing on Windows for some reason: https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583
The embedding CI seems to be failing
I haven't tested as well, but it seems good so feel free to merge
Will be added, though we probably have to merge Jamba (https://github.com/ggerganov/llama.cpp/pull/7531) and then see how to adapt `llama_cache` to support the new Griffin layers
> I am guessing that RPC mode currently does not support mixed CPU and GPU offload, i.e. GPU offload only so if your models doesn't fit in the memory there...
This is an effect from using unified KV cache: https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
No plan at the moment on my side. Haven't figure out a good way to implement this yet
This is not expected