spring-ai
spring-ai copied to clipboard
The metadata of the file created by the TextSplitter class (doSplitDocuments method) is getting the incorrect page number
After running this sample application (ai-openai-rag) I noticed that there are 12 entries in the vector_store table, but the page_number field (in the metadata column) for all the entries is shown as 10.
`vector_store=# select metadata from vector_store; metadata
{"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} (12 rows)`
Looks like the doSplitDocuments method in the TextSplitter class is generating this incorrect page_number for all the documents.