haystack
haystack copied to clipboard
Fixes incorrect ID generation for identical chunks in RecursiveDocumentSplitter
trafficstars
Related Issues
- fixes #9508
Proposed Changes:
The issue occurred because Document.id is generated based on the content and meta at creation time.
However, meta fields like split_id and parent_id were added after the Document was instantiated, causing chunks with identical content and meta to produce identical ids.
- Includes a unit test that verifies uniqueness of IDs and correct metadata assignment
How did you test it?
- Added a unit test:
test_recursive_splitter_generates_unique_ids_and_correct_meta
Notes for the reviewer
Checklist
- [x] I have read the contributors guidelines and the code of conduct
- I have updated the related issue with new insights and changes
- I added unit tests and updated the docstrings
- [x] I've used one of the conventional commit types for my PR title:
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes. - I documented my code
- [x] I ran pre-commit hooks and fixed any issue
Pull Request Test Coverage Report for Build 15689509354
Details
- 0 of 0 changed or added relevant lines in 0 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.001%) to 90.144%
| Totals | |
|---|---|
| Change from base Build 15683643191: | 0.001% |
| Covered Lines: | 11543 |
| Relevant Lines: | 12805 |