py-pdf-parser issues

Results 31 py-pdf-parser issues

Sort by recently updated

Bump wand from 0.6.9 to 0.6.10

Bumps [wand](https://github.com/emcconville/wand) from 0.6.9 to 0.6.10. Release notes Sourced from wand's releases. Wand 0.6.10 The 0.6.10 release is an immediate patch release to address additional segmentation faults, and Apple M1...

dependabot[bot]

dependencies

python

Bump ddt from 1.5.0 to 1.6.0

Bumps [ddt](https://github.com/datadriventests/ddt) from 1.5.0 to 1.6.0. Release notes Sourced from ddt's releases. 1.6.0 What's Changed Moved @named_data into main ddt.py module so it can be imported. by @orgadish in datadriventests/ddt#109...

dependabot[bot]

dependencies

python

Bump matplotlib from 3.5.1 to 3.5.3

Bumps [matplotlib](https://github.com/matplotlib/matplotlib) from 3.5.1 to 3.5.3. Release notes Sourced from matplotlib's releases. REL: v3.5.3 This is the third bugfix release of the 3.5.x series. This release contains several bug-fixes and...

dependabot[bot]

dependencies

python

extract_table ignores ordering defined while loading the document

**Bug Report** `extract_table` re-orders the table rows by the `y` axis (top to bottom), which works for most cases. The issue comes if we have a table with a header...

paulopaixaoamaral

bug

component: tables

Add more tests for the visualise tool

https://github.com/jstockwin/py-pdf-parser/pull/218 adds a single proof of concept test for the visualise tool. We should test it more thoroughly.

jstockwin

priority: medium

difficulty: medium

enhancement

component: tests

component: visualise

Add `remove_duplicate_header_rows` flag to a documentation example

#89 should have had an example added to the documentation. We already have an example that runs through a variety of tables, including ones which go over the page. We...

jstockwin

priority: low

difficulty: easy

component: tables

documentation

Add code coverage checks to CI

We should add some code coverage checks to the CI. Initially this will help us find untested areas (note the visualise tool is currently untested as we're not really sure...

jstockwin

priority: low

difficulty: medium

enhancement

component: meta

Finish the info screen on visualise tool

You can pass `show_info=True` to the visualise tool, and this allows you to click on elements and see details etc. It is unfinished and needs work. - The visuals need...

jstockwin

priority: low

difficulty: hard

enhancement

component: visualise

Use LTChar.size to extract the font size

We [currently](https://github.com/jstockwin/py-pdf-parser/blob/master/py_pdf_parser/components.py#L173) call `.height`. However, from https://github.com/pdfminer/pdfminer.six/issues/202 it looks as though `LTChar` has a `size` attribute. We should use this instead. That said, we should check the PDFMiner code and...

jstockwin

priority: medium

difficulty: easy

component: components

enhancement

Bump shapely from 1.8.2 to 1.8.5.post1

Bumps [shapely](https://github.com/shapely/shapely) from 1.8.2 to 1.8.5.post1. Release notes Sourced from shapely's releases. 1.8.5.post1 No release notes provided. 1.8.5 Packaging Python 3.11 wheels have been added to the matrix for all...

dependabot[bot]

dependencies

python

py-pdf-parser
py-pdf-parser copied to clipboard

Metadata

Bump wand from 0.6.9 to 0.6.10

Bump ddt from 1.5.0 to 1.6.0

Bump matplotlib from 3.5.1 to 3.5.3

extract_table ignores ordering defined while loading the document

Add more tests for the visualise tool

Add `remove_duplicate_header_rows` flag to a documentation example

Add code coverage checks to CI

Finish the info screen on visualise tool

Use LTChar.size to extract the font size

Bump shapely from 1.8.2 to 1.8.5.post1

← Metadata

Owner

Metadata

py-pdf-parser py-pdf-parser copied to clipboard

Metadata

← Metadata

Owner

Metadata

py-pdf-parser
py-pdf-parser copied to clipboard