private-gpt CPU utilization

CPU utilization appears to be capped at 20% Is there a way to increase CPU utilization and thereby enhance performance?

Jun 11 '23 06:06 SH1436

Hello, I assure you that you are not alone in this. This character doesn't use any CPU or GPU, which is frustrating because it makes the process slow. It might be wise to consider if there is a possibility to increase the power of the script.

Jun 11 '23 13:06 Univers4craft

It is not capped at 20%. I am successfully running it at 1600%. Please provide more information and I may be able to help.

Jun 12 '23 14:06 JasonMaggard

Hello, to put it simply, when I ask a question to GPT, it takes ages to respond, and the processor or GPU is not being utilized at 100% or even 50%. Capture d’écran du 2023-06-12 20-00-14

Jun 12 '23 18:06 Univers4craft

By default, the process will only use 4 threads. Try setting n_threads.

llm = GPT4All(model=model_path, n_threads=16, n_ctx=model_n_ctx, backend='llama', verbose=False)

I have CPU @ 99% and 1600% in top.

Jun 12 '23 18:06 JasonMaggard

This configuration is located in the .env file. ?

Jun 12 '23 18:06 Univers4craft

In the query.py.

Jun 12 '23 19:06 JasonMaggard

In the query.py.

There is no such a file in this repo. I've got the same issue with CPU utilisation

Jun 12 '23 19:06 habib-the-sweet

In this repo it's the PrivatePGT.py, line 38. Also, set the number to the number of cores you have.

I'm an end user trying to help. So chill habib. I'm doing this out of kindness. You can also search your repo...

Jun 12 '23 20:06 JasonMaggard

Thanks for your contribution Jason - greatly appreciated.

In my case line 36 reads:

llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)

So, I added the n_threads=12 parameter (12 physical and 24 virtual cores) to line 36 and it now reads:

llm = GPT4All(model=model_path, n_threads=12, n_ctx=model_n_ctx, backend='gptj', verbose=False)

No complaints on startup: Using embedded DuckDB with persistence: data will be stored in: db Found model file. gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ... gptj_model_load: n_vocab = 50400 gptj_model_load: n_ctx = 2048 gptj_model_load: n_embd = 4096 gptj_model_load: n_head = 16 gptj_model_load: n_layer = 28 gptj_model_load: n_rot = 64 gptj_model_load: f16 = 2 gptj_model_load: ggml ctx size = 5401.45 MB gptj_model_load: kv self size = 896.00 MB gptj_model_load: ................................... done
gptj_model_load: model size = 3609.38 MB / num tensors = 285

Enter a query:

However, CPU utilization curiously remained the same at 20%.

Upon further investigation, using Resource Monitor, I noticed that 6 of the 24 logical cores are actually working very hard, whilst the others occasionally blip. Increasing or decreasing the n_threads value does not reflect any change to the number of cores showing activity.

It's as though the repo I'm using is ignoring the n_thread parameter altogether.

Have I implemented it incorrectly?

Jun 13 '23 04:06 SH1436

What is your version of langchain? Are you up to date on the repo? v 0.179 does not use all of the threads.

Jun 13 '23 13:06 JasonMaggard

Langchain version was 0.0.177 so updated to the latest repo and in the process got langchain v 0.197.

Ingest.py utilized 100% CPU but queries were still capped at 20% (6 virtual cores in my case).

However, when I added n_threads=24, to line 39 of privateGPT.py CPU utilization shot up to 100% with all 24 virtual cores working :)

Line 39 now reads: llm = GPT4All(model=model_path, n_threads=24, n_ctx=model_n_ctx, backend='gptj', n_batch=model_n_batch, callbacks=callbacks, verbose=False)

Thanks for your help Jason :)

Jun 13 '23 20:06 SH1436

So, I've tested the n_threads on AWS EC2, and so far the optimal value is 48. I don't understand why but with 72CPUs and 96CPUs, the response speed slowed down instead of increased, even the CPU utilization can go to 7000% and 9000% ... Any insights @SH1436 ?

Jun 24 '23 07:06 sshu2017

@sshu2017 Can you tell me what is average response time to a question with this? I have it close to 20-45 seconds on an N series Azure VM. Also, the accuracy doesn't seem to be good. I have windows 10, all the libraries and set up worked as expected, no issues there. Am I missing something?

Jun 27 '23 08:06 abhishekrai43

Hi @abhishekrai43 May I ask how you measured the accuracy? And does yours generate a full response?

Thanks in advance

Jun 27 '23 12:06 samanemami

@samanemami I got 5 people to ask it 50 questions. It came out to be be close to 50-60%. No, It can cut it off when it wants. It prints context 3 times more in size than the answer. So, I think context is what eats up the token limit cutting off the answer. Sometimes it takes 97 seconds to answer on a 16GB Windows. Looks like it is to be expected?

Jun 27 '23 12:06 abhishekrai43

Thanks @abhishekrai43

Yes, it is about ~90 seconds, and I managed to reduce it to ~45 sec with more threads. I wanted to reduce the time with various batch size but changing batch size terminate the process every time! About the cutting off the answer I did not understand, could please explain it more?

Jun 28 '23 07:06 samanemami

@samanemami truncated.

Jun 28 '23 08:06 abhishekrai43

@samanemami truncated.

So have you found any approach to generate a full answer?

Jun 28 '23 10:06 samanemami

@samanemami Nopes

Jun 28 '23 12:06 abhishekrai43

Hi @abhishekrai43 , sorry for the late reply. With more threads, now I can get a response in ~30 seconds. It was ~150 seconds with everything to the default values. So it's a big improvement but still not good enough.

Jul 06 '23 17:07 sshu2017

llm = GPT4All(model=model_path, n_threads=24, n_ctx=model_n_ctx, backend='gptj', n_batch=model_n_batch, callbacks=callbacks, verbose=False)

In the above line what are the values for n_ctx and n_batch that you guys are using?

Aug 28 '23 05:08 nishanth-k-10

Hello, to put it simply, when I ask a question to GPT, it takes ages to respond, and the processor or GPU is not being utilized at 100% or even 50%.

Hey - can I ask, how are you getting this cli monitoring setup? I want to get that going on my pc.

Aug 28 '23 16:08 mattehicks

Try to increase the value for n_thread parameter. For example if you have 8 cores and 2 threads per core then you can put max value up to 8*2=16 threads. Just don't give all

Aug 30 '23 04:08 nishanth-k-10

llm = GPT4All

that's the spirit! Nice!

Jan 16 '24 09:01 CRPrinzler

https://github.com/imartinez/privateGPT/pull/1589

Feb 13 '24 19:02 lolo9538

private-gpt private-gpt copied to clipboard

CPU utilization

private-gpt
private-gpt copied to clipboard