private-gpt
private-gpt copied to clipboard
Reduce inference time!!!!
Hey guys!!! Hope you are all doing fine. I am actually trying to run privateGPT on my M2 pro and would like to raise an issue regarding the inference time. When I use vicuna13b 4bit quantized models, it takes 40 sec to answer; with koala 7b 5_1 bit quantized model, it takes 20 sec. Any suggestions for making the inference faster? And to note I already used batch sizes to 8 and mlock is true.
I appreciate for your time and please make a suggestion.
Hi @HarikrishnareddGali, May I know details of getting vicuna13b 4bit quantized model and koala 7b 5_1 bit quantized model? And how to use them within this PrivateGPT?
Would be interested to know how you get vic13b 4bit quantized model work in privategpt too.
It takes about 30 seconds to answer queries on the default model and source files. I hope there is a way to optimize it somehow, I know it's complicated, but 30 seconds is just too long, I tried up-sizing the VM to 64 cores and 128 GB RAM but it didn't help, the machine is not utilized at all.