serge Tooo Slow

Bug description

I did try loading the Vicuna model

Steps to reproduce

Good news that it did work Bad news is that it takes up to minutes to write words

Environment Information

I have a Ryzen 3600 and 16gb ddr4 ram

Screenshots

No response

Relevant log output

No response

Confirmations

[X] I'm running the latest version of the main branch.
[X] I checked existing issues to see if this has already been described.

Apr 04 '23 20:04 Gyro0o

What OS are you using ?

Apr 04 '23 21:04 nsarrazin

Win 11

Apr 04 '23 21:04 Gyro0o

Using docker with WSL2 ?

Also if the model is too big for your memory (which could be the case, 16GB + win 11 + a 13B model), then the model would be loaded in swap which depending on your hard drive could slow everything to a crawl.

Can you try using the 7B-native model for example, to see if you get better performance? Please monitor your memory usage when running the model, if it ever hits the limit and uses swap then you're out of luck.

Apr 04 '23 21:04 nsarrazin

Yes I did install docker and wsl2 Ok I will try a 7b model with this repo (I did try the gpt4all repo already and it did work pretty well) Also I have a nvme gen 3 Can I force it to load their instead of the hdd?

Apr 04 '23 23:04 Gyro0o

Using docker with WSL2 ?

Also if the model is too big for your memory (which could be the case, 16GB + win 11 + a 13B model), then the model would be loaded in swap which depending on your hard drive could slow everything to a crawl.

Can you try using the 7B-native model for example, to see if you get better performance? Please monitor your memory usage when running the model, if it ever hits the limit and uses swap then you're out of luck.

This is a comparison between vicuna on lamma cpp and serge gpt4all does work fine with serge tho

https://user-images.githubusercontent.com/117460296/230155599-3c0ec82b-8b7d-477a-97ff-4d5c18f8b31a.mp4

https://user-images.githubusercontent.com/117460296/230155807-547d136a-6317-48f6-8cfd-3522e567e7b1.mp4

Apr 05 '23 17:04 Gyro0o

@Gyro0o Can you try the latest image and check if there's any difference? Also, does your CPU support AVX2 ?

Apr 22 '23 16:04 gaby

@gaby Seems like AMD doesn't publish whether or not the Ryzen 5 3600 has AVX2, but multiple sites report it does: https://www.techpowerup.com/cpu-specs/ryzen-5-3600.c2132

I've seen a couple of sites report that Ryzen's FPU is half the width of Intel's for AVX2, so peak throughput may be lower than Intel's: https://www.reddit.com/r/Amd/comments/7s8grk/an_inane_question_about_avx2_and_ryzen/ ######## @Gyro0o A fair amount has changed since you last posted, not a HUGE amount, but it may make a difference. Is it still exceedingly slow?

May 18 '23 14:05 fishscene

Xeon E5-2630L V4 10cores (AVX2 supported) + 256GB RAM running Linux + Alpaca 30B or Open Assistant 30B takes about 2-3 minutes to begin generate response and few seconds for every single word. It's veeeeeeeeeeeryyy sloooooowww...

May 20 '23 15:05 QBANIN

@QBANIN This is a llama.cpp issue, not Serge. May wanna open an issue there

May 20 '23 16:05 gaby

im using dual CPU E5-2630 v3 @ 2.40GHz. Its very slow. Is there a way to use gpu? I have a titan black and a 980ti that can be dedicated. Is it possible?

Jun 13 '23 06:06 oktaborg

@oktaborg This is a llama.cpp and llama-cpp-python issue. Thats the libraries Serge uses to run the models. Work is being done to add GPU, once they do, we will add it too.

Jun 13 '23 10:06 gaby

Now gpu is supporten?

Jun 13 '23 18:06 oktaborg

Now gpu is supporten?

Not yet

Jun 13 '23 23:06 gaby