doc2text
doc2text copied to clipboard
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
Is this supported on windows?
Hi @jlsutherland and thanks for this cool module, OCR is a hard problem and you provide a pretty efficient and simple solution. Would you be interested by PR with text...
The idea is to get a flat and white background before thresholding. For this a matrix of offsets is added to the image. This matrix is optimized is order to...
When doing: ``` import doc2text doc = doc2text.Document() doc.read('something.pdf') doc.process() ``` I get: ``` Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 23 dst is not a numpy array, neither a scalar Error...
doc.read('/home/ubuntu/doc2text/test.jpg') File "/usr/local/lib/python2.7/dist-packages/doc2text/**init**.py", line 78, in read raise FileNotAcceptedException