spring-ai icon indicating copy to clipboard operation
spring-ai copied to clipboard

TokenTextSplitter creates document with different id than it's input document

Open hamedmonji opened this issue 1 year ago • 3 comments
trafficstars

Bug description when giving a document to TokenTextSplitter after the split the resulting documents have a different id than the input. in a pipeline scenario this is completely unexpected.

Environment Kotlin + Java 21 Spring Boot 3.2.1 Spring AI 1.0.0-M1 (Also on 1.0.0-SNAPSHOT)

Steps to reproduce create a document with specific id, pass it to tokenTextSplitter, the returned document will have a different id.

Expected behavior document id should not change in textSplitter or even any DocumentTransformer for that matter unless explicitly done so.

Minimal Complete Reproducible example val input = Document("123", "content", emptyMap()) val output = TokenTextSplitter().split(input).first() assert(input.id == output.id)

hamedmonji avatar Aug 05 '24 11:08 hamedmonji