haystack icon indicating copy to clipboard operation
haystack copied to clipboard

feat: add page_number to metadata in DocumentSplitter

Open CarlosFerLo opened this issue 1 year ago • 2 comments

Related Issues

  • fixes #6705

Proposed Changes:

I updated the DocumentSplitter methods so that it adds the "page_number" field to the metadata of output documents. This field contains the page number where you can find the document on the original document. The implementation is the same as the one on the v1.25.x.

How did you test it?

I added some new unit test for testing this behaviour, but testing was mainly functional as it was based on a previously functioning code.

Notes for the reviewer

This is my first contribution!!! The .gitignore change is to counter a VSCode extension I have that I am not able to eliminate the commit.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes ✅
  • I added unit tests and updated the docstrings ✅
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
  • I documented my code ✅
  • I ran pre-commit hooks and fixed any issue ✅

CarlosFerLo avatar Apr 25 '24 19:04 CarlosFerLo

@CarlosFerLo thank you very much for your contribution, your PR looks good - I just left some small comments to improve things a bit - let me know if it's not clear or you need help with something.

davidsbatista avatar Apr 26 '24 17:04 davidsbatista

@CarlosFerLo thank you very much for your contribution, your PR looks good - I just left some small comments to improve things a bit - let me know if it's not clear or you need help with something.

Thanks, really appreciate it. Excited to be able to collaborate.

CarlosFerLo avatar Apr 26 '24 17:04 CarlosFerLo

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Apr 29 '24 09:04 CLAassistant

Pull Request Test Coverage Report for Build 8876963566

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 90.12%

Files with Coverage Reduction New Missed Lines %
components/preprocessors/document_splitter.py 1 98.57%
<!-- Total: 1
Totals Coverage Status
Change from base Build 8849558850: 0.02%
Covered Lines: 6330
Relevant Lines: 7024

💛 - Coveralls

coveralls avatar Apr 29 '24 10:04 coveralls