llm-numbers icon indicating copy to clipboard operation
llm-numbers copied to clipboard

CPU Stats for when it's possible

Open eren23 opened this issue 1 year ago • 2 comments

Running sentence-transformers on a CPU for various tasks is also possible, especially for consumer-grade libraries, etc. People are running these models w/o any GPU acceleration, which might be good to mention in the section.

We were using a sentence-transformer since the beginning in and even if it's a small open-source project all the users I know are using it on their CPUs.

eren23 avatar May 22 '23 05:05 eren23

Weirdly, I tried it myself and it was considerably slower: like 20x slower. But I think that would be a really good section to add, especially with us also adding more info on llama.cpp (which we are starting to benchmark now). Give us 2 weeks and we'll see if we can do it.

waleedkadous avatar May 22 '23 18:05 waleedkadous

I would have appreciated to find this number too. From personnal experience (see: https://www.kaggle.com/code/lucasmorin/mistral-7-b-instruct-electricity-co2-consumption) the run time for the same query is 10x, which generally make the cpu usage impractical (or impossible).

lcrmorin avatar Dec 29 '23 11:12 lcrmorin