gpt4-pdf-chatbot-langchain
gpt4-pdf-chatbot-langchain copied to clipboard
Error: Failed to ingest your data
Unfortunately, I get this error when trying to ingest the PDF "creating vector store... error [Error: PineconeClient: Error calling upsert: Error: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response] /Users/admin/gpt4-pdf-chatbot-langchain/scripts/ingest-data.ts:43 throw new Error('Failed to ingest your data'); ^
[Error: Failed to ingest your data]
" I get the same error with different PDFs and it just seems weird because the terminal shows parts of the tex of the pdf in green writing before the error message. Do you have any idea on how to fix this?
Hey there are several potential culprits behind this. I cover them here in the discussions section.
Here are potential causes of the error, I posted below. Try them out and let me know if you still encounter issues.
Troubleshoot the following:
-
In the config folder, replace the PINECONE_INDEX_NAME to match your index name in pinecone.
-
Upgrade your node version to the latest. It's possible you're using a version of Node that doesn't support
fetch
natively. -
Make sure to set Dimensions in the Pinecone dashboard to 1536. (These are OpenAI embeddings dimensions).
-
Switch your Environment in pinecone to us-east1-gcp if the other environment is causing issues.
-
Ensure you have a .env file in the root that contains valid API keys from the pinecone dashboard.
-
Pinecone has limits for each upsert operation, you can read them here and see some below. If you are uploading massive PDF files, you just need to write a loop to ensure upserts don't exceed 100 chunks per request. I will make a PR to the LangChain repo to integrate this.
- Max size for an upsert request is 2MB. Recommended upsert limit is 100 vectors per request.
- Max metadata size per vector is 40 KB.
-
Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.
This pull request https://github.com/rhimanshu909/gpt4-pdf-chatbot-langchain/blob/main/scripts/ingest-data.ts addresses the issue below
creating vector store... error [Error: PineconeClient: Error calling upsert: TypeError: stream.getReader is not a function] [Error: Failed to ingest your data]
Thanks for the quick reply and suggestions, Mayo! Can you maybe tell me how to get a Pinecone Index Name? I couldn't figure it out unfortunately.
Thanks for the quick reply and suggestions, Mayo! Can you maybe tell me how to get a Pinecone Index Name? I couldn't figure it out unfortunately.
It's in the Pinecone dashboard > Indexes > Index Name
Hey there are several potential culprits behind this. I cover them here in the discussions section.
Here are potential causes of the error, I posted below. Try them out and let me know if you still encounter issues.
Troubleshoot the following:
In the config folder, replace the PINECONE_INDEX_NAME to match your index name in pinecone.
Upgrade your node version to the latest. It's possible you're using a version of Node that doesn't support
fetch
natively.Make sure to set Dimensions in the Pinecone dashboard to 1536. (These are OpenAI embeddings dimensions).
Switch your Environment in pinecone to us-east1-gcp if the other environment is causing issues.
Ensure you have a .env file in the root that contains valid API keys from the pinecone dashboard.
Pinecone has limits for each upsert operation, you can read them here and see some below. If you are uploading massive PDF files, you just need to write a loop to ensure upserts don't exceed 100 chunks per request. I will make a PR to the LangChain repo to integrate this.
- Max size for an upsert request is 2MB. Recommended upsert limit is 100 vectors per request.
- Max metadata size per vector is 40 KB.
Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.
I just merged a PR that sorts out this chunking issue.