pypdfocr
pypdfocr copied to clipboard
Fails to run on Mac OS High Sierra
Hi, getting the following when trying to run on HS:
Traceback (most recent call last):
File "/usr/local/bin/pypdfocr", line 9, in
Have followed all instructions to install deps using brew etc....
I see this same issue but it only happens on some PDFs. I am including a (very similar) stack trace below.
Traceback (most recent call last):
File "/usr/local/bin/pypdfocr", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 145, in overlay_hocr_pages
text_pdf_filename = self.overlay_hocr_page(dpi, hocr_filename, img_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 245, in overlay_hocr_page
self.add_text_layer(pdf,hocr_basename,pg_num,height,dpi)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 349, in add_text_layer
para.drawOn(pdf, x*72/dpi, height - y*72/dpi)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 113, in drawOn
self._drawOn(canvas)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 94, in _drawOn
self.draw()#this is the bit you overload
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 72, in draw
Paragraph.draw(self)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 1717, in draw
self.drawPara(self.debug)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 2093, in drawPara
blPara = self.blPara
AttributeError: RotatedPara instance has no attribute 'blPara'
CC @virantha is this something you could take a look at?
This is not localized to MacOS, but seems to apply on Ubuntu as well at least. I believe this is not related to the OS based on the trace.
I believe self.wrap()
needs to be called up higher but am unsure where or with which arguments.
I'm having the same problem on Arch Linux. From looking at the sources, no immediate fix comes to mind. Unfortunately, it also seems like this project is no longer being actively maintained.
On Debian 9.5 I am too seeing a very similar stack trace:
File "/usr/local/bin/pypdfocr", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 492, in main
script.go(sys.argv[1:])
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 474, in go
self._convert_and_file_email(self.pdf_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
ocr_pdffilename = self.run_conversion(pdf_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 145, in overlay_hocr_pages
text_pdf_filename = self.overlay_hocr_page(dpi, hocr_filename, img_filename)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 245, in overlay_hocr_page
self.add_text_layer(pdf,hocr_basename,pg_num,height,dpi)
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 349, in add_text_layer
para.drawOn(pdf, x*72/dpi, height - y*72/dpi)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 113, in drawOn
self._drawOn(canvas)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 94, in _drawOn
self.draw()#this is the bit you overload
File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 72, in draw
Paragraph.draw(self)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 1717, in draw
self.drawPara(self.debug)
File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 2093, in drawPara
blPara = self.blPara
AttributeError: RotatedPara instance has no attribute 'blPara'
So far every file I have tested resulted in this error.
If you take a look at the release history of reportlab, this lines up with the start of this issue. I downgraded report lab and got it working.
https://pypi.org/project/reportlab/#history
I only tried 3.4.0
and it worked, and didn't keep going.
Can confirm it's working with reportlab 3.4.0. I also tried 3.5.0, 3.5.1, 3.5.2, and 3.5.4 without success (same error as above).
Same problem here on Ubuntu 16.04, and the reportlab downgrade to 3.4.0 fixed it. I had to uninstall the installed version, and them installed the old version, as follow:
pip uninstall reportlab
pip install reportlab==3.4.0
Can confirm this fix, too. So big question is what changed in reportlab which rendered this useless. @christmasjumper you should consider renaming your issue
in reportlab/platypus/paragraph.py version 3.5.59
I comment-outed below lines, line1803 to 1807. Then I could use paragraph!
def wrap(self, availWidth, availHeight):
# if availWidth<_FUZZ:
# #we cannot fit here
# return 0, 0x7fffffff
# work out widths array for breaking
_FUZZ is assigned in reportlab/rl_settings, it's 1e-6. I use wrap function via table.wrapOn, through the think of it, the second argument of wrap, availWidth, is the coordinate of table, in most of the explains. Maybe so, it cause issue.
This was happening on p.drawOn() after I had called p.wrap() with availWidth=0. In my case the paragraph text was an empty string anyway, which might have contributed to the zero width. This condition was silently ignored in version 3.4.0. Causes this freaky error with 3.6.12:
File "/usr/local/lib/python3.10/dist-packages/reportlab/platypus/paragraph.py", line 2464, in drawPara
blPara = self.blPara
AttributeError: 'Paragraph' object has no attribute 'blPara'
Thanks @amixedcolor for your explorations, which helped me get to the bottom of this.