gpt4-pdf-chatbot-langchain Vectors not propperly uploaded to Pinecone

Eventhough the ingest execution finishes ok, the vectors dont seem OK:

The Pinecode dimension is set to 1536, and the query gives same result as you:

{"vector":[0,0,0......0,0], "topK":5, "includeMetadata":true, "includeValues":true, "namespace":""}

This is an example of the result of the fetch

{ "vectors":{} "namespace":"demo" }

How is this possible?, is it because of the chunk size, or other parameter?

Thanks in advance

Mar 24 '23 12:03 VonLuisMarck

see #44

Mar 24 '23 13:03 glide-the

Thanks for the reply

However the execution ends up without errors, i get a succesfull ingest output

Mar 24 '23 13:03 VonLuisMarck

I am having the same issue, it says i have 179 vectors on pinecone , but when i try to fetch them by id number and namespace, it shows an empty vector. Screenshot 2023-03-25 at 2 06 00 PM Screenshot 2023-03-25 at 2 05 35 PM

Mar 25 '23 18:03 text2sql

Your vectors are inserted properly, otherwise it wouldn't display a number on the namespace. You can verify this when you receive the source docs back whilst running the chatbot.

Pinecone only allows you to query by id and /or vectors alongside other params docs.

The id of each vector is a uuid, so you cannot simply put a random number to query. Likewise, the vectors are unique [0.1,0.2,1.3...] over 1536 dimensions.

TLDR it's working.

Mar 26 '23 03:03 mayooear

how can i fetch them and check the content? like you did in the YT video? also when i chat, the answer does not come back.

Mar 26 '23 07:03 text2sql

I have the same issue.

After running "npm run ingest", it shows "creating vector store and ingesting data... ingestion complete". In Pinecone, it shows the vector number as well. But when using fetch in Pinecone, all the vector is empty.

Your kind reply will be greatly appreciated!

Apr 01 '23 03:04 Davidyhchen88

I asked pinecone guys the same question some time ago, here is their response https://www.loom.com/share/5dff5d3a7d6940d79288765bacb867bf

Apr 01 '23 12:04 text2sql

Thanks so much for the reply!

I followed the video and tried again but it is still empty.

But I found there could be another reason. Kindly let me know your insights.
In addDocuments, it needs "ids" as input, but static async fromDocuments(docs, embeddings, dbConfig) does not provide the "ids" to it when it calls addDocuments. pls see below the details of these 2:

async addDocuments(documents, ids) { const texts = documents.map(({ pageContent }) => pageContent);
```
 return this.addVectors(await this.embeddings.embedDocuments(texts), documents, ids);
```

2)static async fromDocuments(docs, embeddings, dbConfig) { const args = dbConfig; args.textKey = dbConfig.textKey ?? "text"; const instance = new this(embeddings, args); await instance._addDocuments(docs);_ return instance;

Apr 01 '23 20:04 Davidyhchen88

Sorry, pls ignore my last comment.
Just found out you have defined the ids in the addVectors(vectors, documents, ids): const documentIds = ids == null ? documents.map(() => uuidv4()) : ids;

So he reason why the vectors are empty is not clear now as everything in addVectors(vectors, documents, ids) looks fine and the upsert finished without error. In pinecone, the total vectors are correct but the only issue is the vector contents are empty...

Apr 01 '23 22:04 Davidyhchen88

Has anyone figured this out yet? I followed that video link, dicked around with the QUERY function, was able to see the actual pdf text, but still wasn't able to duplicate using the FETCH function the way mayooear did in the YT video.... until i copied and pasted the long vector name from the QUERY result, which was ""7bb8ddcf-23d8-4921-ad72-478c8aa14fc8"" in my case (without quotes), and that seemed to work. Now I'm thinking maybe this wasn't why my chatbot doesn't respond.

I asked pinecone guys the same question some time ago, here is their response https://www.loom.com/share/5dff5d3a7d6940d79288765bacb867bf

Apr 04 '23 21:04 johnydodger

gpt4-pdf-chatbot-langchain gpt4-pdf-chatbot-langchain copied to clipboard

Vectors not propperly uploaded to Pinecone

gpt4-pdf-chatbot-langchain
gpt4-pdf-chatbot-langchain copied to clipboard