private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

PrivateGPT very inaccurate. Is it me?

Open jjjj12212 opened this issue 1 year ago • 7 comments

Hi,

For testing purposes, I got privateGPT to ingest the following data:

DATE	Customer	Amount
01-Jan	ABC		5
05-Jan	ABC		10
02-Feb	ABC		20
02-Jan	DEF		50
06-Feb	DEF		100
Feb-23	DEF		120

Once it finished ingesting the data, I ran the following query:

What is the total Amount?

> Answer:
 Total amount = 35 (12+17)
Total Amount = 25 + 17
=

> source_documents\data.txt:
DATE    Customer        Amount
01-Jan  ABC     5
05-Jan  ABC     10
02-Feb  ABC     20
02-Jan  DEF     50
06-Feb  DEF     100
Feb-23  DEF     120

Then my Next question:

> Question:
Do you know the total amount for ABC? If you don't know the answer, just say that you don't know, don't try to make up an answer

> Answer:
 Yes, I do know the total amount for ABC = 25 + 17 = 42

So my question here is, is there something I'm doing wrong? am I not formatting the data correctly?

Note that I did the same thing with free ChatGPT 3.5 and it answered it perfectly.

jjjj12212 avatar May 18 '23 16:05 jjjj12212

I also forgot to mention, I'm using

ggml-gpt4all-j-v1.3-groovy.bin
ggml-model-q4_0.bin

And I also tried:

all-MiniLM-L6-v2
all-mpnet-base-v2

And tried to convert it to a CSV. The results are still bad.

While converting the data.txt to a CSV file, the Source now shows:

> source_documents\data.csv:
DATE: 01-Jan
Customer: ABC
Amount: 5

And no other lines.

jjjj12212 avatar May 18 '23 16:05 jjjj12212

I'd be curious to know if you're able to resolve this.

I have been trying to ingest datasets in .csv format and have not been able to query it successfully.

wenger9 avatar May 19 '23 03:05 wenger9

Hi RSJ9,

I don't think this is resolved yet.

jjjj12212 avatar May 19 '23 04:05 jjjj12212

I think the main reason is almost all the small llama-like models without RLHF are kind of terrible in mathematics. Vicuna's authors also have reported some limitations here https://lmsys.org/blog/2023-03-30-vicuna/

tanhm12 avatar May 19 '23 18:05 tanhm12

I'm finding that it has difficulty with even basic data correlation for information that is in a csv file. Even for something very simple like this:

Campaign Date Delivered Opens Clicks Conversions Campaign 1 5/28/23 100 80 20 10 Campaign 2 5/29/23 150 120 30 15

and I ask it "Find the campaign with the most clicks" - every query's results are different as it creates random assumptions that arent part of the data set or that your not asking it.

bendwebs avatar Jun 03 '23 21:06 bendwebs

Would it helpful to reference a local model that is meant for a specific task, like instruct?

wenger9 avatar Jun 04 '23 13:06 wenger9

Is this fixed? I seem to get trash answers too using vicuna-13b

altahookah avatar Jun 14 '23 07:06 altahookah