Oleg Klimov comments

Results 50 comments of


                                            Oleg Klimov

[bounty] CPU inference support, Mac M1/M2 inference support

/bounty $2000

OOM issues cause auto-completion and chat response to stop

What can we do with this? Memory warning when adding a model. Clear message un UI about a past OOM event.

[bounty] CPU inference support, Mac M1/M2 inference support

> I'm a beginner here... You can start with installing it and trying out. But unless you already familiar with CPU inference libraries and LLMs in general, it might take...

[bounty] CPU inference support, Mac M1/M2 inference support

CPU project names: ggml, ctransformers

[bounty] CPU inference support, Mac M1/M2 inference support

An interesting link: https://github.com/ggerganov/llama.cpp/discussions/2948 -- how to convert HuggingFace model to GGUF format Example of GGUFs of all sizes: https://huggingface.co/TheBloke/Llama-2-7B-GGUF

[bounty] CPU inference support, Mac M1/M2 inference support

Hi @teleprint-me Someone is trying the heavy lifting here: https://github.com/ggerganov/llama.cpp/issues/3061

[bounty] CPU inference support, Mac M1/M2 inference support

@teleprint-me We are moving away from server-side scratchpads, in favor of client-side scratchpads. The plugins that can do it should land next week or a week after. There still has...

ROCm support

Oh I see, you wrote a Dockerfile! We have no way to test it, because we have no AMD gpus, but maybe we can set up the building process and...

[bounty] CPU inference support, Mac M1/M2 inference support

Testing this: ``` ./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiple two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0 ``` I see speed:...

[bounty] CPU inference support, Mac M1/M2 inference support

Xeon 5315Y | Threads -t N | speed tokens/s | | -------------| ---------| | -t 2 | 6 | | -t 4 | 11 | | -t 8 | 11...