pdfquery
pdfquery copied to clipboard
A fast and friendly PDF scraping library.
When loading a pdf via `pdf.load()`, specifically the Pathfinder 2E playtest pdf, this error is returned: ```Traceback (most recent call last): File "", line 1, in File "C:\Users\Graham Drakeley\.virtualenvs\Downloads-jPJuT-kW\lib\site-packages\pdfquery\pdfquery.py", line...
Update documentation, formatting, some imports. That's it :-)
Hi! We discovered that the can't concat str to bytes error on load() can be fixed : you must fix a bug on line 271 of pdfquery version 0.4.3 by...
Currently there's no way to read a password protected pdf file using `pdfquery`. Only `parser` arg is passed to `QPDFDocument` which later inits the `PDFDocument` with same (no `password` arg...
There is a bug in pdfquery ( see previous issue report). We switched to pdfminer and reduced processing time from 20 min to 2 min.
`pyquery` has dropped Python 2 support as of v1.4.3, but `pdfquery` has no upper bound on this dependency for its Python 2 build. See https://pypi.org/project/pyquery/ and view release notes for...
Replacing `all(...` with simple ands cuts time to 1/3 of what it took before. Found this after profiling on a pdf that required ~32m of these comparisons
On pypi, I see that the project is in `Development Status :: 4 - Beta`, but the last commit to this repository is more than half a year ago. Is...
Is there a way by which I can pass coordinates of a particular text and get locator as output. For eg: input in_bbox : [22,278, 67, 289] output: LTTextLineHorizontal:contains("address")') maybe...
The xml file generated correctly detects the line when there is padding in the line item. However, if there's no padding, the code is unable to detect any line. Can...