Tooo Slow
Bug description
I did try loading the Vicuna model
Steps to reproduce
Good news that it did work Bad news is that it takes up to minutes to write words
Environment Information
I have a Ryzen 3600 and 16gb ddr4 ram
Screenshots
No response
Relevant log output
No response
Confirmations
- [X] I'm running the latest version of the main branch.
- [X] I checked existing issues to see if this has already been described.
What OS are you using ?
Win 11
Using docker with WSL2 ?
Also if the model is too big for your memory (which could be the case, 16GB + win 11 + a 13B model), then the model would be loaded in swap which depending on your hard drive could slow everything to a crawl.
Can you try using the 7B-native model for example, to see if you get better performance? Please monitor your memory usage when running the model, if it ever hits the limit and uses swap then you're out of luck.
Yes I did install docker and wsl2 Ok I will try a 7b model with this repo (I did try the gpt4all repo already and it did work pretty well) Also I have a nvme gen 3 Can I force it to load their instead of the hdd?
Using docker with WSL2 ?
Also if the model is too big for your memory (which could be the case, 16GB + win 11 + a 13B model), then the model would be loaded in swap which depending on your hard drive could slow everything to a crawl.
Can you try using the 7B-native model for example, to see if you get better performance? Please monitor your memory usage when running the model, if it ever hits the limit and uses swap then you're out of luck.
This is a comparison between vicuna on lamma cpp and serge gpt4all does work fine with serge tho
https://user-images.githubusercontent.com/117460296/230155599-3c0ec82b-8b7d-477a-97ff-4d5c18f8b31a.mp4
https://user-images.githubusercontent.com/117460296/230155807-547d136a-6317-48f6-8cfd-3522e567e7b1.mp4
@Gyro0o Can you try the latest image and check if there's any difference? Also, does your CPU support AVX2 ?
@gaby Seems like AMD doesn't publish whether or not the Ryzen 5 3600 has AVX2, but multiple sites report it does: https://www.techpowerup.com/cpu-specs/ryzen-5-3600.c2132
I've seen a couple of sites report that Ryzen's FPU is half the width of Intel's for AVX2, so peak throughput may be lower than Intel's: https://www.reddit.com/r/Amd/comments/7s8grk/an_inane_question_about_avx2_and_ryzen/ ######## @Gyro0o A fair amount has changed since you last posted, not a HUGE amount, but it may make a difference. Is it still exceedingly slow?
Xeon E5-2630L V4 10cores (AVX2 supported) + 256GB RAM running Linux + Alpaca 30B or Open Assistant 30B takes about 2-3 minutes to begin generate response and few seconds for every single word. It's veeeeeeeeeeeryyy sloooooowww...
@QBANIN This is a llama.cpp issue, not Serge. May wanna open an issue there
im using dual CPU E5-2630 v3 @ 2.40GHz. Its very slow. Is there a way to use gpu? I have a titan black and a 980ti that can be dedicated. Is it possible?
@oktaborg This is a llama.cpp and llama-cpp-python issue. Thats the libraries Serge uses to run the models. Work is being done to add GPU, once they do, we will add it too.
Now gpu is supporten?
Now gpu is supporten?
Not yet