feat: add page_number to metadata in DocumentSplitter
Related Issues
- fixes #6705
Proposed Changes:
I updated the DocumentSplitter methods so that it adds the "page_number" field to the metadata of output documents. This field contains the page number where you can find the document on the original document. The implementation is the same as the one on the v1.25.x.
How did you test it?
I added some new unit test for testing this behaviour, but testing was mainly functional as it was based on a previously functioning code.
Notes for the reviewer
This is my first contribution!!! The .gitignore change is to counter a VSCode extension I have that I am not able to eliminate the commit.
Checklist
- I have read the contributors guidelines and the code of conduct ✅
- I have updated the related issue with new insights and changes ✅
- I added unit tests and updated the docstrings ✅
- I've used one of the conventional commit types for my PR title:
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:. ✅ - I documented my code ✅
- I ran pre-commit hooks and fixed any issue ✅
@CarlosFerLo thank you very much for your contribution, your PR looks good - I just left some small comments to improve things a bit - let me know if it's not clear or you need help with something.
@CarlosFerLo thank you very much for your contribution, your PR looks good - I just left some small comments to improve things a bit - let me know if it's not clear or you need help with something.
Thanks, really appreciate it. Excited to be able to collaborate.
Pull Request Test Coverage Report for Build 8876963566
Details
- 0 of 0 changed or added relevant lines in 0 files are covered.
- 1 unchanged line in 1 file lost coverage.
- Overall coverage increased (+0.02%) to 90.12%
| Files with Coverage Reduction | New Missed Lines | % |
|---|---|---|
| components/preprocessors/document_splitter.py | 1 | 98.57% |
| <!-- | Total: | 1 |
| Totals | |
|---|---|
| Change from base Build 8849558850: | 0.02% |
| Covered Lines: | 6330 |
| Relevant Lines: | 7024 |