Auto-CORPus icon indicating copy to clipboard operation
Auto-CORPus copied to clipboard

Word test additions and old .doc document conversion

Open Thomas-Rowlands opened this issue 7 months ago • 1 comments

Description

Implementation of word document text & table extraction.

Fixes #220

Type of change

  • [ ] Documentation (non-breaking change that adds or improves the documentation)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Optimization (non-breaking, back-end change that speeds up the code)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Breaking change (whatever its nature)

Key checklist

  • [x] All tests pass (eg. pytest)
  • [ ] The documentation builds and looks OK (eg. mkdocs)
  • [ ] Pre-commit hooks run successfully (eg. pre-commit run --all-files)

Further checks

  • [ ] Code is commented, particularly in hard-to-understand areas
  • [ ] Tests added or an issue has been opened to tackle that in the future. (Indicate issue here: # (issue))

Thomas-Rowlands avatar May 20 '25 21:05 Thomas-Rowlands

PS -- you'll need to merge in main and resolve merge conflicts. If you do that then it'll also enable codecov for this PR so we can see what is/isn't covered by tests.

alexdewar avatar May 22 '25 11:05 alexdewar

Codecov Report

Attention: Patch coverage is 55.84906% with 117 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
autocorpus/word.py 39.39% 53 Missing and 7 partials :warning:
autocorpus/bioc_supplementary.py 43.83% 41 Missing :warning:
autocorpus/file_processing.py 55.00% 4 Missing and 5 partials :warning:
autocorpus/ac_bioc/json.py 0.00% 5 Missing :warning:
autocorpus/ac_bioc/passage.py 94.44% 0 Missing and 1 partial :warning:
autocorpus/bioc_formatter.py 95.23% 0 Missing and 1 partial :warning:
Files with missing lines Coverage Δ
autocorpus/ac_bioc/annotation.py 47.61% <100.00%> (+1.10%) :arrow_up:
autocorpus/ac_bioc/bioctable/cell.py 100.00% <ø> (+14.28%) :arrow_up:
autocorpus/ac_bioc/bioctable/collection.py 100.00% <ø> (+30.00%) :arrow_up:
autocorpus/ac_bioc/bioctable/document.py 100.00% <ø> (+25.00%) :arrow_up:
autocorpus/ac_bioc/bioctable/json.py 50.00% <ø> (-30.77%) :arrow_down:
autocorpus/ac_bioc/bioctable/passage.py 100.00% <ø> (+55.00%) :arrow_up:
autocorpus/ac_bioc/collection.py 89.74% <100.00%> (-1.75%) :arrow_down:
autocorpus/ac_bioc/document.py 100.00% <100.00%> (ø)
autocorpus/ac_bioc/location.py 61.90% <100.00%> (+4.76%) :arrow_up:
autocorpus/ac_bioc/node.py 90.47% <100.00%> (+5.47%) :arrow_up:
... and 10 more

... and 1 file with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 27 '25 11:05 codecov[bot]

Implemented the updates suggested. Further refactors will be made to the codebase with other PRs once the supplementary material features are in.

Thomas-Rowlands avatar Jun 03 '25 16:06 Thomas-Rowlands