Oleg Klimov
Oleg Klimov
I'll try to reproduce
I left 1.6b (regular backend) for a day, memory settled on 6.19 Gb of memory RAM. I additionally sent 750 completion requests today and it's still 6.19 GB. I don't...
Called for help from @mitya52
hmm now I see 11.9Gb on my setup 🤔
It doesn't say "out of memory" for you. 🤔 Not sure how to debug this. @bonswouar what GPU do you have?
Yes we want CPU support, and a small inference server code without much dependencies would be great. The current work is in #77
We'll actually solve this! New plugins with a rust binary will use standard API. (HF or OpenAI style)
Hi @octopusx We tested various models on CPU, it's about 4-8 seconds for a single code completion, even for 1.6b or a starcoder 1b, on Apple M1 hardware. Maybe we'll...
Ah I see, that makes total sense. I think the best way to solve this is to add providers to the rust layer, for the new plugins. We'll release the...