doc2text issues

it'd be nice if this could produce text-overlaid PDFs

7

tesseract seems to be able to produce PDFs these days with text overlaid on the image. This is useful for searching int he PDF when viewing that way. It'd be...

jbothma

Image not cropped accurately

On calling the process_image() method, the image to be processed is not cropped accurately (attached below). Which (and how) of the calls in the method will I need to modify...

tekurkaa

I have installed doc2text and required packages but when I try to import doc2text it gives me error no module name PythonMagick. ![2021-07-21](https://user-images.githubusercontent.com/33904670/126509150-05030fcd-1f07-43b8-9165-7c40716a4802.png)

atul219

Does is support stream data ?

I'm having a flask app which gets the file from the api and want to get the text out of it , but i don't want to save it on...

multinucliated

Python 3.5 compatibility

6

Seems library not 100% python3 compatible. When I'm tying to run simple code: ``` import doc2text doc = doc2text.Document() doc = doc2text.Document(lang="eng") doc.read('pdf-sample.pdf') ``` I'm getting ``` Traceback (most recent...

andjelx

Python 3 compatibility fix

1

Adds xrange replacement for python3

andjelx

ModuleNotFoundError: No module named 'PyPDF2'

1

``` Traceback (most recent call last): File "test.py", line 1, in import doc2text File "/Users/Stan/Downloads/doc2text-master/doc2text/__init__.py", line 6, in import PyPDF2 as pyPdf ModuleNotFoundError: No module named 'PyPDF2' ```

alexauvray

doc2text
doc2text copied to clipboard

Metadata

it'd be nice if this could produce text-overlaid PDFs

Image not cropped accurately

No module name PythonMagick

Does is support stream data ?

Python 3.5 compatibility

Python 3 compatibility fix

ModuleNotFoundError: No module named 'PyPDF2'

FileNotFoundError

text extraction from png files does not seem to work

Unable to process

← Metadata

Owner

Metadata

doc2text doc2text copied to clipboard

Metadata

← Metadata

Owner

Metadata

doc2text
doc2text copied to clipboard