pdfquery issues

cache collision

1

Scrapping two different PDFs yields the exact same results when using the `FileCache`. The problem is that `set_hash_key()` always computes the same key because the file is already seek at...

patxoca

Extract all words with their coordinates.

Hi, Thank you for this wonderful library which extracts text from pdf files. I want to use this into one of my projects but I have some different requirements. I...

infoankit10

'PDFObjRef' object does not support indexing

7

`import pdfquery import sys pdf = pdfquery.PDFQuery(sys.argv[1]) pdf.load()` `Traceback (most recent call last): File "bin/parse_pdf.py", line 6, in pdf.load() File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 385, in load self.tree = self.get_tree(*_flatten(page_numbers)) File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py",...

travis-st

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

1

environment: pdfquery=0.4.3 python=2.7.15 I use pdfquery to load [this pdf](http://www.iachina.cn/IC/tkk/03/62d11c2a-8fd6-4b00-aa55-cf9320cf72ae_TERMS.PDF), and encounter an error. error information as follows: pdf.load() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 385, in load self.tree = self.get_tree(*_flatten(page_numbers)) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pdfquery/pdfquery.py",...

vikotse

windows only: pdfquery is locking the opended pdf-file

1

I try open pdfs files to query data from it and then use that data to rename the pdf-file. On windows this code fails with renaming cause the file is...

iconberg

PdfQuery | .extract problem

I have a problem with pdf query library. I have been trying to figure this out the last few days but I can't seem to figure it out. **Code:** ![function](https://user-images.githubusercontent.com/36231151/55832490-fbe46180-5b15-11e9-8798-517ba71c6cbd.jpg)...

rutgervanheijningen

Fix range() page numbers for Python3 & prevent long cache file names

This should fix issue #67 In Python 3 a `range()` is not converted to a list by default which breaks the `_flatten` function used for flattening the list of page...

chk1

can load the pages I need

1

pdf.load(0, 2, 3, range(4,8)) gives me this error TypeError: '>=' not supported between instances of 'range' and 'int'

Thug0416

How does pdfquery determine the index?

[Amazon_CF.pdf](https://github.com/jcushman/pdfquery/files/2096962/Amazon_CF.pdf) [Amazon.txt](https://github.com/jcushman/pdfquery/files/2096967/Amazon.txt) Hi jcushman! I am a freshman from Hong Kong and currently trying to find a way to read tables from PDF and work with its data. I tried...

SalmonTT

ValueError: Invalid attribute name u'AAPL:AKExtras'

3

Processing a PDF with annotations that have a colon in their key value gives an exception: ``` Traceback (most recent call last): File "test_ocr.py", line 633, in test_petition analyze =...

speedplane

pdfquery
pdfquery copied to clipboard

Metadata

cache collision

Extract all words with their coordinates.

'PDFObjRef' object does not support indexing

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

windows only: pdfquery is locking the opended pdf-file

PdfQuery | .extract problem

Fix range() page numbers for Python3 & prevent long cache file names

can load the pages I need

How does pdfquery determine the index?

ValueError: Invalid attribute name u'AAPL:AKExtras'

← Metadata

Owner

Metadata

pdfquery pdfquery copied to clipboard

Metadata

← Metadata

Owner

Metadata

pdfquery
pdfquery copied to clipboard