gpt4-pdf-chatbot-langchain icon indicating copy to clipboard operation
gpt4-pdf-chatbot-langchain copied to clipboard

Error: Failed to ingest your data

Open Arche151 opened this issue 1 year ago • 4 comments

Unfortunately, I get this error when trying to ingest the PDF "creating vector store... error [Error: PineconeClient: Error calling upsert: Error: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response] /Users/admin/gpt4-pdf-chatbot-langchain/scripts/ingest-data.ts:43 throw new Error('Failed to ingest your data'); ^

[Error: Failed to ingest your data]

" I get the same error with different PDFs and it just seems weird because the terminal shows parts of the tex of the pdf in green writing before the error message. Do you have any idea on how to fix this?

Arche151 avatar Mar 20 '23 09:03 Arche151

Hey there are several potential culprits behind this. I cover them here in the discussions section.

Here are potential causes of the error, I posted below. Try them out and let me know if you still encounter issues.

Troubleshoot the following:

  • In the config folder, replace the PINECONE_INDEX_NAME to match your index name in pinecone.

  • Upgrade your node version to the latest. It's possible you're using a version of Node that doesn't support fetch natively.

  • Make sure to set Dimensions in the Pinecone dashboard to 1536. (These are OpenAI embeddings dimensions).

  • Switch your Environment in pinecone to us-east1-gcp if the other environment is causing issues.

  • Ensure you have a .env file in the root that contains valid API keys from the pinecone dashboard.

  • Pinecone has limits for each upsert operation, you can read them here and see some below. If you are uploading massive PDF files, you just need to write a loop to ensure upserts don't exceed 100 chunks per request. I will make a PR to the LangChain repo to integrate this.

    • Max size for an upsert request is 2MB. Recommended upsert limit is 100 vectors per request.
    • Max metadata size per vector is 40 KB.
  • Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.

mayooear avatar Mar 20 '23 17:03 mayooear

This pull request https://github.com/rhimanshu909/gpt4-pdf-chatbot-langchain/blob/main/scripts/ingest-data.ts addresses the issue below creating vector store... error [Error: PineconeClient: Error calling upsert: TypeError: stream.getReader is not a function] [Error: Failed to ingest your data]

rhimanshu909 avatar Mar 21 '23 00:03 rhimanshu909

Thanks for the quick reply and suggestions, Mayo! Can you maybe tell me how to get a Pinecone Index Name? I couldn't figure it out unfortunately.

Arche151 avatar Mar 21 '23 04:03 Arche151

Thanks for the quick reply and suggestions, Mayo! Can you maybe tell me how to get a Pinecone Index Name? I couldn't figure it out unfortunately.

It's in the Pinecone dashboard > Indexes > Index Name

shynsky avatar Mar 21 '23 15:03 shynsky

Hey there are several potential culprits behind this. I cover them here in the discussions section.

Here are potential causes of the error, I posted below. Try them out and let me know if you still encounter issues.

Troubleshoot the following:

  • In the config folder, replace the PINECONE_INDEX_NAME to match your index name in pinecone.

  • Upgrade your node version to the latest. It's possible you're using a version of Node that doesn't support fetch natively.

  • Make sure to set Dimensions in the Pinecone dashboard to 1536. (These are OpenAI embeddings dimensions).

  • Switch your Environment in pinecone to us-east1-gcp if the other environment is causing issues.

  • Ensure you have a .env file in the root that contains valid API keys from the pinecone dashboard.

  • Pinecone has limits for each upsert operation, you can read them here and see some below. If you are uploading massive PDF files, you just need to write a loop to ensure upserts don't exceed 100 chunks per request. I will make a PR to the LangChain repo to integrate this.

    • Max size for an upsert request is 2MB. Recommended upsert limit is 100 vectors per request.
    • Max metadata size per vector is 40 KB.
  • Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.

I just merged a PR that sorts out this chunking issue.

mayooear avatar Mar 21 '23 21:03 mayooear