haystack-core-integrations
haystack-core-integrations copied to clipboard
PineconeDocumentStore raises error to the metadata produced by DocumentSplitter
Describe the bug
PineconeDocumentStore
raises an error when I try to index a document that was split by DocumentSplitter
. Error message 👇
PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 23 Jul 2024 12:46:03 GMT', 'Content-Type': 'application/json', 'Content-Length': '160', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '903', 'x-pinecone-request-id': '2298458388900737762', 'x-envoy-upstream-service-time': '37', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Metadata value must be a string, number, boolean or list of strings, got '[{\"doc_id\":\"22e0...' for field '_split_overlap'","details":[]}
Document object that raises the error is below. "_split_overlap"
seems to be a list of dict
Document(id=37fa03ca409f457046696a3bec987d5cb627f655cbcf0c019f7334bc170da4b8, content: 'Vegan Persimmon Flan
Recipe by Tilde Thurium
This makes 2 servings. Why did I write a recipe that...', meta: {'file_path': '/content/recipe_files/vegan_flan_recipe.md', 'source_id': 'a01a0ae2f396930e9cd3475986ae716cb26c554f6b49d4c61dfeb473ddeb7ced', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0, '_split_overlap': [{'doc_id': '0520d3c17150c5fd057a19bdc796e9f9c3a632f1d9acf730154d888ee3fc86be', 'range': (0, 305)}]})
To Reproduce
import os
os.environ["PINECONE_API_KEY"] = "PINECONE-KEY"
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore
document_store = PineconeDocumentStore(
index="<ENTER_PINECONE_INDEX_NAME>",
namespace="<ENTER_PINECONE-PROJECT-NAME>",
dimension=1536,
spec={"serverless": {"region": "us-east-1", "cloud": "aws"}},
)
from haystack.components.preprocessors import DocumentSplitter
from haystack import Document
source_docs = [Document(content="""
Vegan Persimmon Flan
Recipe by Tilde Thurium
This makes 2 servings. Why did I write a recipe that only makes 2 servings? It was the height of COVID, okay, don't judge me.
Tools:
2 ramekins
Blender
Ingredients:
½ cup persimmon pulp, strained. This takes 2 average sized fuyu persimmons. If they have seeds, remove them.
1 tbsp cornstarch
½ tsp agar agar
1 tbsp agave nectar, or to taste
2 tbsp granulated sugar
¼ cup coconut creme
½ cup almond milk
½ tsp vanilla
Steps
I tried making caramel with the [Full Of Plants](https://www.google.com/url?q=https%3A%2F%2Ffullofplants.com%2Feasy-vegan-caramel-sauce%2F) method but it was a pain in the ass and I burned myself.
For this recipe, just put the sugar at the bottom of the cup and it somehow magically turns into sauce. Lifehack!
Combine the cornstarch with the almond milk and stir it in.
whisk persimmon pulp, milk/cornstarch, agar agar, coconut creme, and agave in a saucepan. Bring to a boil.
The persimmon pulp got a little congealed, so I mixed it with an immersion blender. But you do you, boo.
Let the persimmon mixture cool a bit, for maybe 5 minutes. Stir in the vanilla. Pour it in to your ramekins or what have you.
Don’t forget and let it cool to room temperature. Agar agar waits for no man.
Refrigerate for at least 4 hours, or overnight.
To remove from ramekin, try the hot water bath method (didn’t work for me, maybe the water wasn’t hot enough.) Or just run a knife along the edges of the ramekin and jiggle it out.""")]
document_splitter = DocumentSplitter(split_by="word", split_length=40, split_overlap=10)
split_docs = document_splitter.run(documents=source_docs)
document_store.write_documents(documents=split_docs["documents"])
Describe your environment (please complete the following information):
- OS: Colab
- Haystack version: 2.3
- Integration version: 1.2.1