doc2text icon indicating copy to clipboard operation
doc2text copied to clipboard

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

Results 15 doc2text issues
Sort by recently updated
recently updated
newest added

Hi @jlsutherland and thanks for this cool module, OCR is a hard problem and you provide a pretty efficient and simple solution. Would you be interested by PR with text...

The idea is to get a flat and white background before thresholding. For this a matrix of offsets is added to the image. This matrix is optimized is order to...

When doing: ``` import doc2text doc = doc2text.Document() doc.read('something.pdf') doc.process() ``` I get: ``` Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 23 dst is not a numpy array, neither a scalar Error...

doc.read('/home/ubuntu/doc2text/test.jpg') File "/usr/local/lib/python2.7/dist-packages/doc2text/**init**.py", line 78, in read raise FileNotAcceptedException