pdfquery icon indicating copy to clipboard operation
pdfquery copied to clipboard

A fast and friendly PDF scraping library.

Results 27 pdfquery issues
Sort by recently updated
recently updated
newest added

Scrapping two different PDFs yields the exact same results when using the `FileCache`. The problem is that `set_hash_key()` always computes the same key because the file is already seek at...

Hi, Thank you for this wonderful library which extracts text from pdf files. I want to use this into one of my projects but I have some different requirements. I...

`import pdfquery import sys pdf = pdfquery.PDFQuery(sys.argv[1]) pdf.load()` `Traceback (most recent call last): File "bin/parse_pdf.py", line 6, in pdf.load() File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 385, in load self.tree = self.get_tree(*_flatten(page_numbers)) File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py",...

environment: pdfquery=0.4.3 python=2.7.15 I use pdfquery to load [this pdf](http://www.iachina.cn/IC/tkk/03/62d11c2a-8fd6-4b00-aa55-cf9320cf72ae_TERMS.PDF), and encounter an error. error information as follows: pdf.load() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 385, in load self.tree = self.get_tree(*_flatten(page_numbers)) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pdfquery/pdfquery.py",...

I try open pdfs files to query data from it and then use that data to rename the pdf-file. On windows this code fails with renaming cause the file is...

I have a problem with pdf query library. I have been trying to figure this out the last few days but I can't seem to figure it out. **Code:** ![function](https://user-images.githubusercontent.com/36231151/55832490-fbe46180-5b15-11e9-8798-517ba71c6cbd.jpg)...

This should fix issue #67 In Python 3 a `range()` is not converted to a list by default which breaks the `_flatten` function used for flattening the list of page...

pdf.load(0, 2, 3, range(4,8)) gives me this error TypeError: '>=' not supported between instances of 'range' and 'int'

[Amazon_CF.pdf](https://github.com/jcushman/pdfquery/files/2096962/Amazon_CF.pdf) [Amazon.txt](https://github.com/jcushman/pdfquery/files/2096967/Amazon.txt) Hi jcushman! I am a freshman from Hong Kong and currently trying to find a way to read tables from PDF and work with its data. I tried...

Processing a PDF with annotations that have a colon in their key value gives an exception: ``` Traceback (most recent call last): File "test_ocr.py", line 633, in test_petition analyze =...