haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Fixes incorrect ID generation for identical chunks in RecursiveDocumentSplitter

Open gulbaki opened this issue 5 months ago • 1 comments
trafficstars

Related Issues

  • fixes #9508

Proposed Changes:

The issue occurred because Document.id is generated based on the content and meta at creation time.

However, meta fields like split_id and parent_id were added after the Document was instantiated, causing chunks with identical content and meta to produce identical ids.

  • Includes a unit test that verifies uniqueness of IDs and correct metadata assignment

How did you test it?

  • Added a unit test: test_recursive_splitter_generates_unique_ids_and_correct_meta

Notes for the reviewer

Checklist

  • [x] I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • [x] I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • [x] I ran pre-commit hooks and fixed any issue

gulbaki avatar Jun 14 '25 13:06 gulbaki

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 14 '25 13:06 CLAassistant

Pull Request Test Coverage Report for Build 15689509354

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.001%) to 90.144%

Totals Coverage Status
Change from base Build 15683643191: 0.001%
Covered Lines: 11543
Relevant Lines: 12805

💛 - Coveralls

coveralls avatar Jun 16 '25 19:06 coveralls