Oleg Klimov

Results 50 comments of Oleg Klimov

What can we do with this? Memory warning when adding a model. Clear message un UI about a past OOM event.

> I'm a beginner here... You can start with installing it and trying out. But unless you already familiar with CPU inference libraries and LLMs in general, it might take...

An interesting link: https://github.com/ggerganov/llama.cpp/discussions/2948 -- how to convert HuggingFace model to GGUF format Example of GGUFs of all sizes: https://huggingface.co/TheBloke/Llama-2-7B-GGUF

Hi @teleprint-me Someone is trying the heavy lifting here: https://github.com/ggerganov/llama.cpp/issues/3061

@teleprint-me We are moving away from server-side scratchpads, in favor of client-side scratchpads. The plugins that can do it should land next week or a week after. There still has...

Oh I see, you wrote a Dockerfile! We have no way to test it, because we have no AMD gpus, but maybe we can set up the building process and...

Testing this: ``` ./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiple two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0 ``` I see speed:...

Xeon 5315Y | Threads -t N | speed tokens/s | | -------------| ---------| | -t 2 | 6 | | -t 4 | 11 | | -t 8 | 11...