pypdfocr icon indicating copy to clipboard operation
pypdfocr copied to clipboard

Fails to run on Mac OS High Sierra

Open christmasjumper opened this issue 6 years ago • 11 comments

Hi, getting the following when trying to run on HS:

Traceback (most recent call last): File "/usr/local/bin/pypdfocr", line 9, in load_entry_point('pypdfocr==0.9.1', 'console_scripts', 'pypdfocr')() File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr.py", line 492, in main script.go(sys.argv[1:]) File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr.py", line 474, in go self._convert_and_file_email(self.pdf_filename) File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email ocr_pdffilename = self.run_conversion(pdf_filename) File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename) File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 145, in overlay_hocr_pages text_pdf_filename = self.overlay_hocr_page(dpi, hocr_filename, img_filename) File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 245, in overlay_hocr_page self.add_text_layer(pdf,hocr_basename,pg_num,height,dpi) File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 349, in add_text_layer para.drawOn(pdf, x72/dpi, height - y72/dpi) File "/Library/Python/2.7/site-packages/reportlab/platypus/flowables.py", line 113, in drawOn self._drawOn(canvas) File "/Library/Python/2.7/site-packages/reportlab/platypus/flowables.py", line 94, in _drawOn self.draw()#this is the bit you overload File "/Library/Python/2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 72, in draw Paragraph.draw(self) File "/Library/Python/2.7/site-packages/reportlab/platypus/paragraph.py", line 1717, in draw self.drawPara(self.debug) File "/Library/Python/2.7/site-packages/reportlab/platypus/paragraph.py", line 2093, in drawPara blPara = self.blPara

Have followed all instructions to install deps using brew etc....

christmasjumper avatar Aug 25 '18 18:08 christmasjumper

I see this same issue but it only happens on some PDFs. I am including a (very similar) stack trace below.

Traceback (most recent call last):
  File "/usr/local/bin/pypdfocr", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 492, in main
    script.go(sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 474, in go
    self._convert_and_file_email(self.pdf_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
    ocr_pdffilename = self.run_conversion(pdf_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
    ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 145, in overlay_hocr_pages
    text_pdf_filename = self.overlay_hocr_page(dpi, hocr_filename, img_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 245, in overlay_hocr_page
    self.add_text_layer(pdf,hocr_basename,pg_num,height,dpi)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 349, in add_text_layer
    para.drawOn(pdf, x*72/dpi, height - y*72/dpi)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 113, in drawOn
    self._drawOn(canvas)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 94, in _drawOn
    self.draw()#this is the bit you overload
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 72, in draw
    Paragraph.draw(self)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 1717, in draw
    self.drawPara(self.debug)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 2093, in drawPara
    blPara = self.blPara
AttributeError: RotatedPara instance has no attribute 'blPara'

rmspeers avatar Sep 12 '18 20:09 rmspeers

CC @virantha is this something you could take a look at?

This is not localized to MacOS, but seems to apply on Ubuntu as well at least. I believe this is not related to the OS based on the trace.

I believe self.wrap() needs to be called up higher but am unsure where or with which arguments.

rmspeers avatar Sep 12 '18 20:09 rmspeers

I'm having the same problem on Arch Linux. From looking at the sources, no immediate fix comes to mind. Unfortunately, it also seems like this project is no longer being actively maintained.

mrpg avatar Sep 17 '18 15:09 mrpg

On Debian 9.5 I am too seeing a very similar stack trace:

  File "/usr/local/bin/pypdfocr", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 492, in main
    script.go(sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 474, in go
    self._convert_and_file_email(self.pdf_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
    ocr_pdffilename = self.run_conversion(pdf_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
    ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 145, in overlay_hocr_pages
    text_pdf_filename = self.overlay_hocr_page(dpi, hocr_filename, img_filename)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 245, in overlay_hocr_page
    self.add_text_layer(pdf,hocr_basename,pg_num,height,dpi)
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 349, in add_text_layer
    para.drawOn(pdf, x*72/dpi, height - y*72/dpi)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 113, in drawOn
    self._drawOn(canvas)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/flowables.py", line 94, in _drawOn
    self.draw()#this is the bit you overload
  File "/usr/local/lib/python2.7/dist-packages/pypdfocr/pypdfocr_pdf.py", line 72, in draw
    Paragraph.draw(self)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 1717, in draw
    self.drawPara(self.debug)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/paragraph.py", line 2093, in drawPara
    blPara = self.blPara
AttributeError: RotatedPara instance has no attribute 'blPara'

So far every file I have tested resulted in this error.

Luuk3333 avatar Sep 18 '18 08:09 Luuk3333

If you take a look at the release history of reportlab, this lines up with the start of this issue. I downgraded report lab and got it working.

https://pypi.org/project/reportlab/#history

I only tried 3.4.0 and it worked, and didn't keep going.

sirdavidwong avatar Sep 26 '18 04:09 sirdavidwong

Can confirm it's working with reportlab 3.4.0. I also tried 3.5.0, 3.5.1, 3.5.2, and 3.5.4 without success (same error as above).

Luuk3333 avatar Sep 26 '18 09:09 Luuk3333

Same problem here on Ubuntu 16.04, and the reportlab downgrade to 3.4.0 fixed it. I had to uninstall the installed version, and them installed the old version, as follow:

pip uninstall reportlab
pip install reportlab==3.4.0

douglascrp avatar Nov 01 '18 18:11 douglascrp

Can confirm this fix, too. So big question is what changed in reportlab which rendered this useless. @christmasjumper you should consider renaming your issue

f0rdprefect avatar Nov 23 '18 13:11 f0rdprefect

in reportlab/platypus/paragraph.py version 3.5.59

I comment-outed below lines, line1803 to 1807. Then I could use paragraph!

def wrap(self, availWidth, availHeight):
    # if availWidth<_FUZZ:
    #     #we cannot fit here
    #     return 0, 0x7fffffff
    # work out widths array for breaking

amixedcolor avatar Apr 12 '22 06:04 amixedcolor

_FUZZ is assigned in reportlab/rl_settings, it's 1e-6. I use wrap function via table.wrapOn, through the think of it, the second argument of wrap, availWidth, is the coordinate of table, in most of the explains. Maybe so, it cause issue.

amixedcolor avatar Apr 12 '22 06:04 amixedcolor

This was happening on p.drawOn() after I had called p.wrap() with availWidth=0. In my case the paragraph text was an empty string anyway, which might have contributed to the zero width. This condition was silently ignored in version 3.4.0. Causes this freaky error with 3.6.12:

File "/usr/local/lib/python3.10/dist-packages/reportlab/platypus/paragraph.py", line 2464, in drawPara
  blPara = self.blPara
AttributeError: 'Paragraph' object has no attribute 'blPara'

Thanks @amixedcolor for your explorations, which helped me get to the bottom of this.

BobStein avatar Nov 28 '22 02:11 BobStein