py-pdf-parser icon indicating copy to clipboard operation
py-pdf-parser copied to clipboard

A Python tool to help extracting information from structured PDFs.

Results 31 py-pdf-parser issues
Sort by recently updated
recently updated
newest added

Bumps [wand](https://github.com/emcconville/wand) from 0.6.9 to 0.6.10. Release notes Sourced from wand's releases. Wand 0.6.10 The 0.6.10 release is an immediate patch release to address additional segmentation faults, and Apple M1...

dependencies
python

Bumps [ddt](https://github.com/datadriventests/ddt) from 1.5.0 to 1.6.0. Release notes Sourced from ddt's releases. 1.6.0 What's Changed Moved @​named_data into main ddt.py module so it can be imported. by @​orgadish in datadriventests/ddt#109...

dependencies
python

Bumps [matplotlib](https://github.com/matplotlib/matplotlib) from 3.5.1 to 3.5.3. Release notes Sourced from matplotlib's releases. REL: v3.5.3 This is the third bugfix release of the 3.5.x series. This release contains several bug-fixes and...

dependencies
python

**Bug Report** `extract_table` re-orders the table rows by the `y` axis (top to bottom), which works for most cases. The issue comes if we have a table with a header...

bug
component: tables

https://github.com/jstockwin/py-pdf-parser/pull/218 adds a single proof of concept test for the visualise tool. We should test it more thoroughly.

priority: medium
difficulty: medium
enhancement
component: tests
component: visualise

#89 should have had an example added to the documentation. We already have an example that runs through a variety of tables, including ones which go over the page. We...

priority: low
difficulty: easy
component: tables
documentation

We should add some code coverage checks to the CI. Initially this will help us find untested areas (note the visualise tool is currently untested as we're not really sure...

priority: low
difficulty: medium
enhancement
component: meta

You can pass `show_info=True` to the visualise tool, and this allows you to click on elements and see details etc. It is unfinished and needs work. - The visuals need...

priority: low
difficulty: hard
enhancement
component: visualise

We [currently](https://github.com/jstockwin/py-pdf-parser/blob/master/py_pdf_parser/components.py#L173) call `.height`. However, from https://github.com/pdfminer/pdfminer.six/issues/202 it looks as though `LTChar` has a `size` attribute. We should use this instead. That said, we should check the PDFMiner code and...

priority: medium
difficulty: easy
component: components
enhancement

Bumps [shapely](https://github.com/shapely/shapely) from 1.8.2 to 1.8.5.post1. Release notes Sourced from shapely's releases. 1.8.5.post1 No release notes provided. 1.8.5 Packaging Python 3.11 wheels have been added to the matrix for all...

dependencies
python