spring-ai icon indicating copy to clipboard operation
spring-ai copied to clipboard

The metadata of the file created by the TextSplitter class (doSplitDocuments method) is getting the incorrect page number

Open iAMSagar44 opened this issue 1 year ago • 0 comments
trafficstars

After running this sample application (ai-openai-rag) I noticed that there are 12 entries in the vector_store table, but the page_number field (in the metadata column) for all the entries is shown as 10.

`vector_store=# select metadata from vector_store; metadata

{"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} (12 rows)`

Looks like the doSplitDocuments method in the TextSplitter class is generating this incorrect page_number for all the documents.

iAMSagar44 avatar Mar 04 '24 10:03 iAMSagar44