private-gpt
private-gpt copied to clipboard
PrivateGPT very inaccurate. Is it me?
Hi,
For testing purposes, I got privateGPT to ingest the following data:
DATE Customer Amount
01-Jan ABC 5
05-Jan ABC 10
02-Feb ABC 20
02-Jan DEF 50
06-Feb DEF 100
Feb-23 DEF 120
Once it finished ingesting the data, I ran the following query:
What is the total Amount?
> Answer:
Total amount = 35 (12+17)
Total Amount = 25 + 17
=
> source_documents\data.txt:
DATE Customer Amount
01-Jan ABC 5
05-Jan ABC 10
02-Feb ABC 20
02-Jan DEF 50
06-Feb DEF 100
Feb-23 DEF 120
Then my Next question:
> Question:
Do you know the total amount for ABC? If you don't know the answer, just say that you don't know, don't try to make up an answer
> Answer:
Yes, I do know the total amount for ABC = 25 + 17 = 42
So my question here is, is there something I'm doing wrong? am I not formatting the data correctly?
Note that I did the same thing with free ChatGPT 3.5 and it answered it perfectly.
I also forgot to mention, I'm using
ggml-gpt4all-j-v1.3-groovy.bin
ggml-model-q4_0.bin
And I also tried:
all-MiniLM-L6-v2
all-mpnet-base-v2
And tried to convert it to a CSV. The results are still bad.
While converting the data.txt to a CSV file, the Source now shows:
> source_documents\data.csv:
DATE: 01-Jan
Customer: ABC
Amount: 5
And no other lines.
I'd be curious to know if you're able to resolve this.
I have been trying to ingest datasets in .csv format and have not been able to query it successfully.
Hi RSJ9,
I don't think this is resolved yet.
I think the main reason is almost all the small llama-like models without RLHF are kind of terrible in mathematics. Vicuna's authors also have reported some limitations here https://lmsys.org/blog/2023-03-30-vicuna/
I'm finding that it has difficulty with even basic data correlation for information that is in a csv file. Even for something very simple like this:
Campaign Date Delivered Opens Clicks Conversions Campaign 1 5/28/23 100 80 20 10 Campaign 2 5/29/23 150 120 30 15
and I ask it "Find the campaign with the most clicks" - every query's results are different as it creates random assumptions that arent part of the data set or that your not asking it.
Would it helpful to reference a local model that is meant for a specific task, like instruct?
Is this fixed? I seem to get trash answers too using vicuna-13b