private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Reduce inference time!!!!

Open HarikrishnareddGali opened this issue 1 year ago • 2 comments

Hey guys!!! Hope you are all doing fine. I am actually trying to run privateGPT on my M2 pro and would like to raise an issue regarding the inference time. When I use vicuna13b 4bit quantized models, it takes 40 sec to answer; with koala 7b 5_1 bit quantized model, it takes 20 sec. Any suggestions for making the inference faster? And to note I already used batch sizes to 8 and mlock is true.

I appreciate for your time and please make a suggestion.

HarikrishnareddGali avatar Jun 13 '23 18:06 HarikrishnareddGali

Hi @HarikrishnareddGali, May I know details of getting vicuna13b 4bit quantized model and koala 7b 5_1 bit quantized model? And how to use them within this PrivateGPT?

AjinkyaBankar avatar Jun 16 '23 20:06 AjinkyaBankar

Would be interested to know how you get vic13b 4bit quantized model work in privategpt too.

jonyeecoder avatar Jun 19 '23 02:06 jonyeecoder

It takes about 30 seconds to answer queries on the default model and source files. I hope there is a way to optimize it somehow, I know it's complicated, but 30 seconds is just too long, I tried up-sizing the VM to 64 cores and 128 GB RAM but it didn't help, the machine is not utilized at all.

TahirAhmadov avatar Jun 21 '23 23:06 TahirAhmadov