djpdf icon indicating copy to clipboard operation
djpdf copied to clipboard

Some exceptions. Also `jbig2` vs. `jbig2enc`.

Open DiagonalArg opened this issue 1 year ago • 2 comments

Looks like a nice tool, thanks. I'm running scans2pdf on the output of scantailor-advanced. Some exceptions occurred, so here is the output of running it on the first page. I can provide the image if it would be useful.

Note that ubuntu 22.04 has a snap providing jbig2enc, while you're looking for jbig2.

$ scans2pdf -v Feigon-001-000.crop_2R.tif scans2.pdf
WARNING:Program not found: jbig2
DEBUG:Using selector: EpollSelector
DEBUG:Running command: ['convert', '-colorspace', 'sRGB', '-profile', '/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/argyllcms-srgb.icm', '-background', '#ffffff', '-alpha', 'remove', '-alpha', 'off', '-type', 'TrueColor', '/home/user/Feigon-001-000.crop_2R.tif', '/var/tmp/djpdf-xwxq05cu/image.png']
DEBUG:convert-im6.q16: profile 'icc': 'RGB ': RGB color space not permitted on grayscale PNG `/var/tmp/djpdf-xwxq05cu/image.png' @ warning/png.c/MagickPNGWarningHandler/1668.

DEBUG:Running command: ['convert', '-fill', '#000000', '-opaque', '#000000', '-fill', '#000000', '-opaque', '#000000', '-threshold', '0', '/var/tmp/djpdf-xwxq05cu/image.png', '/var/tmp/djpdf-nma9ouby/image.png']
DEBUG:Running command: ['identify', '-units', 'PixelsPerInch', '-format', '%x %y', '/var/tmp/djpdf-xwxq05cu/image.png']
DEBUG:Running command: ['convert', '-fill', '#ffffff', '-opaque', '#000000', '-resize', '50%', '/var/tmp/djpdf-xwxq05cu/image.png', '/var/tmp/djpdf-_4oeppyf/image.png']
DEBUG:Running command: ['identify', '-format', '%w %h', '/var/tmp/djpdf-xwxq05cu/image.png']
DEBUG:Running command: ['convert', '-format', '%c', '/var/tmp/djpdf-nma9ouby/image.png', 'histogram:info:-']
DEBUG:Running command: ['convert', '-format', '%c', '/var/tmp/djpdf-_4oeppyf/image.png', 'histogram:info:-']
DEBUG:Running command: ['convert', '-fill', '#000000', '-opaque', '#000000', '-fill', '#000000', '-opaque', '#000000', '-threshold', '0', '/var/tmp/djpdf-xwxq05cu/image.png', '/var/tmp/djpdf-0h9szie5/image.png']
DEBUG:Running command: ['tesseract', '-l', 'eng', '--dpi', '600', '/var/tmp/djpdf-0h9szie5/image.png', '/var/tmp/djpdf-hogdbirb/ocr', 'hocr']
DEBUG:Tesseract Open Source OCR Engine v4.1.1 with Leptonica

INFO:Can't extract textangle from ocr_line: bbox 716 941 826 987; baseline 0 0; x_size 61; x_descenders 15.25; x_ascenders 15.25
DEBUG:Exception occurred:
Traceback (most recent call last):
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/hocr.py", line 46, in extract_text
    textangle = textangle_regex.search(line.attrib["title"]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

INFO:Can't extract textangle from ocr_line: bbox 643 1005 898 1043; baseline -0.008 -6; x_size 37; x_descenders 7; x_ascenders 8
DEBUG:Exception occurred:
Traceback (most recent call last):
  File "/home/user/.local/pipx/venvs/djpdf/lib/python3.10/site-packages/djpdf/hocr.py", line 46, in extract_text
    textangle = textangle_regex.search(line.attrib["title"]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

DEBUG:Running command: ['convert', '-alpha', 'remove', '-alpha', 'off', '-colorspace', 'gray', '-threshold', '50%', '-compress', 'fax', '/var/tmp/djpdf-nma9ouby/image.png', '/var/tmp/djpdf-82nt536j/image.pdf']
DEBUG:Running command: ['qpdf', '--stream-data=preserve', '--object-streams=preserve', '--normalize-content=n', '--newline-before-endstream', '--linearize', '/var/tmp/djpdf-1p_vtodx/temp.pdf', '/home/user/scans2.pdf']

DiagonalArg avatar Jun 02 '23 12:06 DiagonalArg