Oleg Klimov comments

Results 50 comments of


                                            Oleg Klimov

[bounty] CPU inference support, Mac M1/M2 inference support

First token, 551 prompt: * 1172ms on M1 * 25404ms on Xeon 5315Y I'd say that's the main problem for adoption of this. 551-token prompt isn't even that big, normally...

[bounty] CPU inference support, Mac M1/M2 inference support

I tried Starcoder 1b, converted by TabbyML: https://huggingface.co/TabbyML/StarCoder-1B/tree/main/ggml ``` "-m", "starcoder-1b-q8_0.gguf", 897.71 ms / 557 tokens ( 1.61 ms per token, 620.47 tokens per second) 1334.68 ms / 49 runs...

[bounty] CPU inference support, Mac M1/M2 inference support

OK it works nicely! So all the credit goes to @ds5t5, right?

[bounty] CPU inference support, Mac M1/M2 inference support

@teleprint-me oh I see you've converted the 1.6b model in several quantizations, thank you for that! (I thought your tests were for llama, the name is confusing)

Amd gpu supports

Makes sense!

Consider integrating with atom

Interesting! > This approach is better for summarizing and identifying context Are you saying Atom might help to fill model context, to help it to come up with a better...

No progress bar in web GUI when downloading layers

It's kind of expected, unless we want to hack into the model download process, or parse text output, we don't have a way to forward the progress into the GUI....

Amd gpu supports

We have this PR: https://github.com/smallcloudai/refact/pull/252 that we'll test somehow (don't have any AMD GPUs), or at least we'll set up an auto docker build and someone will test it :D

make container restart on reboot instead of disappearing

Maybe when rebooting COMPUTER @psyrtsov wants self-hosting to auto start again?

Model memory usage / quantization

We have sharding, should be solved! (not yet in docker today)