gpt4-pdf-chatbot-langchain
gpt4-pdf-chatbot-langchain copied to clipboard
Vectors not propperly uploaded to Pinecone
Eventhough the ingest execution finishes ok, the vectors dont seem OK:
The Pinecode dimension is set to 1536, and the query gives same result as you:
{"vector":[0,0,0......0,0], "topK":5, "includeMetadata":true, "includeValues":true, "namespace":""}
This is an example of the result of the fetch
{ "vectors":{} "namespace":"demo" }
How is this possible?, is it because of the chunk size, or other parameter?
Thanks in advance
see #44
Thanks for the reply
However the execution ends up without errors, i get a succesfull ingest output
I am having the same issue, it says i have 179 vectors on pinecone , but when i try to fetch them by id number and namespace, it shows an empty vector.

Your vectors are inserted properly, otherwise it wouldn't display a number on the namespace. You can verify this when you receive the source docs back whilst running the chatbot.
Pinecone only allows you to query by id and /or vectors alongside other params docs.
The id of each vector is a uuid, so you cannot simply put a random number to query. Likewise, the vectors are unique [0.1,0.2,1.3...] over 1536 dimensions.
TLDR it's working.
how can i fetch them and check the content? like you did in the YT video? also when i chat, the answer does not come back.
I have the same issue.
After running "npm run ingest", it shows "creating vector store and ingesting data... ingestion complete". In Pinecone, it shows the vector number as well. But when using fetch in Pinecone, all the vector is empty.
Your kind reply will be greatly appreciated!
I asked pinecone guys the same question some time ago, here is their response https://www.loom.com/share/5dff5d3a7d6940d79288765bacb867bf
Thanks so much for the reply!
I followed the video and tried again but it is still empty.
But I found there could be another reason. Kindly let me know your insights.
In addDocuments, it needs "ids" as input, but static async fromDocuments(docs, embeddings, dbConfig) does not provide the "ids" to it when it calls addDocuments. pls see below the details of these 2:
-
async addDocuments(documents, ids) { const texts = documents.map(({ pageContent }) => pageContent);
return this.addVectors(await this.embeddings.embedDocuments(texts), documents, ids);
2)static async fromDocuments(docs, embeddings, dbConfig) { const args = dbConfig; args.textKey = dbConfig.textKey ?? "text"; const instance = new this(embeddings, args); await instance._addDocuments(docs);_ return instance;
Sorry, pls ignore my last comment.
Just found out you have defined the ids in the addVectors(vectors, documents, ids):
const documentIds = ids == null ? documents.map(() => uuidv4()) : ids;
So he reason why the vectors are empty is not clear now as everything in addVectors(vectors, documents, ids) looks fine and the upsert finished without error. In pinecone, the total vectors are correct but the only issue is the vector contents are empty...
Has anyone figured this out yet? I followed that video link, dicked around with the QUERY function, was able to see the actual pdf text, but still wasn't able to duplicate using the FETCH function the way mayooear did in the YT video.... until i copied and pasted the long vector name from the QUERY result, which was ""7bb8ddcf-23d8-4921-ad72-478c8aa14fc8"" in my case (without quotes), and that seemed to work. Now I'm thinking maybe this wasn't why my chatbot doesn't respond.
I asked pinecone guys the same question some time ago, here is their response https://www.loom.com/share/5dff5d3a7d6940d79288765bacb867bf
