Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

In the PDF output by OCR, the Chinese text is all n

Open CrazyBunQnQ opened this issue 1 year ago • 1 comments

When using OCR to recognize Chinese PDFs, it is able to recognize the text in the PDF (at least Chinese text can be seen from the output text file), but the text copied from the output PDF is all n, and if the image is cleared when exporting, it looks blank and nothing.

log:

   **** Error: Tf refers to an unknown resource name: F2 Assuming it's a font name.
                   Output may be incorrect.

I'm running in Docker. And I installed Chinese fonts via apt install fonts-noto-cjk-extra but it still doesn't work.

CrazyBunQnQ avatar Jan 05 '24 08:01 CrazyBunQnQ

I also encountered this problem. The output text file is normal Chinese text, but the Chinese text copied in the PDF is all n

StimeKe avatar Jan 09 '24 03:01 StimeKe

I also encountered this problem. The output text file is normal Chinese text, but the Chinese text copied in the PDF is all n. However, selecting "Sandwich" in the rendering type can recognize the Chinese characters in the output, but it may not be accurate enough.

mikevshu avatar Jan 11 '24 04:01 mikevshu

2024-01-11 03:34:10,397 INFO o.s.w.s.DispatcherServlet [http-nio-8080-exec-1] Initializing Servlet 'dispatcherServlet' 2024-01-11 03:34:10,400 INFO o.s.w.s.DispatcherServlet [http-nio-8080-exec-1] Completed initialization in 2 ms 2024-01-11 03:35:09,624 INFO s.s.S.u.ProcessExecutor [http-nio-8080-exec-3] Running command: ocrmypdf --verbose 2 --output-type pdf --pdf-renderer hocr --skip-text --language chi_sim+eng /tmp/input_5434599845335999770.pdf /tmp/output_17761550626375076767.pdf 2024-01-11 03:35:10,090 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf - ocrmypdf 15.4.4 2024-01-11 03:35:10,090 INFO s.s.S.u.ProcessExecutor [Thread-1] WARNING ocrmypdf._validation - The 'hocr' PDF renderer is known to cause problems with one or more of the languages in your document. Use --pdf-renderer auto (the default) to avoid this issue. 2024-01-11 03:35:10,090 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] 2024-01-11 03:35:10,106 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Found tesseract 5.3.2 2024-01-11 03:35:10,107 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] 2024-01-11 03:35:10,123 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] 2024-01-11 03:35:10,140 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Found gs 9.55.0 2024-01-11 03:35:10,142 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] 2024-01-11 03:35:10,158 INFO s.s.S.u.ProcessExecutor [Thread-1] WARNING ocrmypdf.builtin_plugins.ghostscript - The installed version of Ghostscript 9.55.0, contains a remote code execution security vulnerability. Please upgrade to a newer version. For details see CVE-2023-43115. The issue is not known to affect OCRmyPDF or processing PDFs with Ghostscript, but upgrading Ghostscript is recommended. 2024-01-11 03:35:10,159 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs'] 2024-01-11 03:35:10,184 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (3): 2024-01-11 03:35:10,186 INFO s.s.S.u.ProcessExecutor [Thread-1] chi_sim 2024-01-11 03:35:10,188 INFO s.s.S.u.ProcessExecutor [Thread-1] eng 2024-01-11 03:35:10,189 INFO s.s.S.u.ProcessExecutor [Thread-1] osd 2024-01-11 03:35:10,190 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:10,195 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,195 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_5434599845335999770.pdf, /tmp/ocrmypdf.io._ji5fyow/origin) 2024-01-11 03:35:10,195 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io._ji5fyow/origin, /tmp/ocrmypdf.io._ji5fyow/origin.pdf) 2024-01-11 03:35:10,196 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG root - Gathering info with 1 thread workers 2024-01-11 03:35:10,196 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,204 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:10,205 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 1 2024-01-11 03:35:10,206 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipelines.ocr - Start processing 2 pages concurrently 2024-01-11 03:35:10,208 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,209 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 1 Rasterize with png16m, rotation 0 2024-01-11 03:35:10,211 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,212 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 2 Rasterize with png16m, rotation 0 2024-01-11 03:35:10,213 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 2 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=2', '-dLastPage=2', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:10,216 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 1 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:11,359 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IHDR' 16 13 2024-01-11 03:35:11,360 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'iCCP' 41 2354 2024-01-11 03:35:11,360 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:11,361 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 Compression method 0 2024-01-11 03:35:11,362 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'pHYs' 2407 9 2024-01-11 03:35:11,362 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'tEXt' 2428 31 2024-01-11 03:35:11,363 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:11,363 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._exec.ghostscript - 2 Rotating output by 0 2024-01-11 03:35:11,502 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IHDR' 16 13 2024-01-11 03:35:11,502 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'iCCP' 41 2354 2024-01-11 03:35:11,503 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:11,503 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 Compression method 0 2024-01-11 03:35:11,503 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'pHYs' 2407 9 2024-01-11 03:35:11,504 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'tEXt' 2428 31 2024-01-11 03:35:11,505 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:12,145 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IHDR' 16 13 2024-01-11 03:35:12,146 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'iCCP' 41 2350 2024-01-11 03:35:12,146 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 iCCP profile name b'ICC Profile' 2024-01-11 03:35:12,147 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 Compression method 0 2024-01-11 03:35:12,152 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'pHYs' 2403 9 2024-01-11 03:35:12,152 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:12,152 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 2 resolution (293.8526, 293.8526) 2024-01-11 03:35:12,380 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IHDR' 16 13 2024-01-11 03:35:12,381 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'iCCP' 41 2350 2024-01-11 03:35:12,381 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 iCCP profile name b'ICC Profile' 2024-01-11 03:35:12,381 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 Compression method 0 2024-01-11 03:35:12,383 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'pHYs' 2403 9 2024-01-11 03:35:12,383 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:12,384 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 1 resolution (293.8526, 293.8526) 2024-01-11 03:35:12,967 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000002_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000002_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:13,276 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 1 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000001_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000001_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:27,326 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 3 Rasterize with png16m, rotation 0 2024-01-11 03:35:27,328 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 3 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=3', '-dLastPage=3', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:27,334 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 2 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:27,334 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 2 Grafting 2024-01-11 03:35:27,340 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 2 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:28,594 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IHDR' 16 13 2024-01-11 03:35:28,595 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'iCCP' 41 2354 2024-01-11 03:35:28,596 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:28,598 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 Compression method 0 2024-01-11 03:35:28,599 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'pHYs' 2407 9 2024-01-11 03:35:28,600 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'tEXt' 2428 31 2024-01-11 03:35:28,601 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:29,547 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IHDR' 16 13 2024-01-11 03:35:29,547 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'iCCP' 41 2350 2024-01-11 03:35:29,548 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 iCCP profile name b'ICC Profile' 2024-01-11 03:35:29,550 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 Compression method 0 2024-01-11 03:35:29,550 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'pHYs' 2403 9 2024-01-11 03:35:29,552 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:29,553 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 3 resolution (293.8526, 293.8526) 2024-01-11 03:35:30,452 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 3 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000003_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000003_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:36,731 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 4 Rasterize with png16m, rotation 0 2024-01-11 03:35:36,731 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 4 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=4', '-dLastPage=4', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:36,732 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:36,732 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 1 Grafting 2024-01-11 03:35:36,734 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 1 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:37,796 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IHDR' 16 13 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'iCCP' 41 2354 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 Compression method 0 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'pHYs' 2407 9 2024-01-11 03:35:37,798 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'tEXt' 2428 31 2024-01-11 03:35:37,798 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:38,496 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IHDR' 16 13 2024-01-11 03:35:38,496 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'iCCP' 41 2350 2024-01-11 03:35:38,496 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 iCCP profile name b'ICC Profile' 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 Compression method 0 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'pHYs' 2403 9 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 4 resolution (293.8526, 293.8526) 2024-01-11 03:35:39,185 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 4 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000004_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000004_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:39,646 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 3 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:39,647 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 3 Grafting 2024-01-11 03:35:39,647 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 3 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:45,840 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 4 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:45,840 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 4 Grafting 2024-01-11 03:35:45,842 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 4 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:45,844 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:45,860 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipelines.ocr - Postprocessing... 2024-01-11 03:35:45,863 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] 2024-01-11 03:35:45,908 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-UHRZQ9HQ5X533U0JuqpMnw in page 0 2024-01-11 03:35:45,909 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: treating as an optimization candidate 2024-01-11 03:35:45,910 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-oCuAfBVpeysrcI5E9oCg0Q in page 1 2024-01-11 03:35:45,911 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: treating as an optimization candidate 2024-01-11 03:35:45,912 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-BOcqWvWNgLbYwba-X5JEYA in page 2 2024-01-11 03:35:45,913 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: treating as an optimization candidate 2024-01-11 03:35:45,915 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-vOoHdTKk2akf1cKO1c0suA in page 3 2024-01-11 03:35:45,916 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: treating as an optimization candidate 2024-01-11 03:35:46,569 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=25, ext='.png') 2024-01-11 03:35:47,331 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=27, ext='.png') 2024-01-11 03:35:47,896 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=29, ext='.png') 2024-01-11 03:35:48,631 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=23, ext='.png') 2024-01-11 03:35:48,632 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 4 2024-01-11 03:35:48,633 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:48,634 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-UHRZQ9HQ5X533U0JuqpMnw in page 0 2024-01-11 03:35:48,634 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: treating as an optimization candidate 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-oCuAfBVpeysrcI5E9oCg0Q in page 1 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: treating as an optimization candidate 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-BOcqWvWNgLbYwba-X5JEYA in page 2 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: treating as an optimization candidate 2024-01-11 03:35:48,636 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-vOoHdTKk2akf1cKO1c0suA in page 3 2024-01-11 03:35:48,636 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: treating as an optimization candidate 2024-01-11 03:35:48,637 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: marking this JPEG as deflatable 2024-01-11 03:35:48,637 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: marking this JPEG as deflatable 2024-01-11 03:35:48,638 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: marking this JPEG as deflatable 2024-01-11 03:35:48,639 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: marking this JPEG as deflatable 2024-01-11 03:35:48,670 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:48,670 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-UHRZQ9HQ5X533U0JuqpMnw in page 0 2024-01-11 03:35:48,672 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: treating as an optimization candidate 2024-01-11 03:35:48,672 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-oCuAfBVpeysrcI5E9oCg0Q in page 1 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: treating as an optimization candidate 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-BOcqWvWNgLbYwba-X5JEYA in page 2 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: treating as an optimization candidate 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-vOoHdTKk2akf1cKO1c0suA in page 3 2024-01-11 03:35:48,674 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: treating as an optimization candidate 2024-01-11 03:35:48,674 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,675 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,677 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,678 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,679 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0 2024-01-11 03:35:48,680 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:48,724 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io._ji5fyow/optimize.opt.pdf, /tmp/ocrmypdf.io._ji5fyow/optimize.pdf) 2024-01-11 03:35:48,725 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version'] 2024-01-11 03:35:48,726 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version'] 2024-01-11 03:35:48,727 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipeline - Image optimization ratio: 1.54 savings: 35.1% 2024-01-11 03:35:48,728 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipeline - Total file size ratio: 1.52 savings: 34.2% 2024-01-11 03:35:48,730 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io._ji5fyow/optimize.pdf -> /tmp/output_17761550626375076767.pdf root@OpenWrt:/opt/volumes/s-pdf/logs# tac info.log -ash: tac: not found root@OpenWrt:/opt/volumes/s-pdf/logs# cat info.log.bak 2024-01-11 03:33:57,775 INFO s.s.S.SPdfApplication [main] Starting SPdfApplication using Java 17.0.9 with PID 1 (/app.jar started by root in /) 2024-01-11 03:33:57,788 INFO s.s.S.SPdfApplication [main] No active profile set, falling back to 1 default profile: "default" 2024-01-11 03:34:02,322 INFO o.s.b.w.e.t.TomcatWebServer [main] Tomcat initialized with port 8080 (http) 2024-01-11 03:34:02,345 INFO o.a.c.h.Http11NioProtocol [main] Initializing ProtocolHandler ["http-nio-8080"] 2024-01-11 03:34:02,349 INFO o.a.c.c.StandardService [main] Starting service [Tomcat] 2024-01-11 03:34:02,350 INFO o.a.c.c.StandardEngine [main] Starting Servlet engine: [Apache Tomcat/10.1.17] 2024-01-11 03:34:02,463 INFO o.a.c.c.C.[.[.[/] [main] Initializing Spring embedded WebApplicationContext 2024-01-11 03:34:02,467 INFO o.s.b.w.s.c.ServletWebServerApplicationContext [main] Root WebApplicationContext: initialization completed in 4406 ms 2024-01-11 03:34:03,010 INFO s.s.S.c.EndpointConfiguration [main] Disabling pdf-to-book 2024-01-11 03:34:03,013 INFO s.s.S.c.EndpointConfiguration [main] Disabling book-to-pdf 2024-01-11 03:34:03,022 INFO s.s.S.c.PostStartupProcesses [main] No custom apps to install. 2024-01-11 03:34:05,566 INFO o.s.b.a.e.w.EndpointLinksResolver [main] Exposing 1 endpoint(s) beneath base path '/actuator' 2024-01-11 03:34:05,695 INFO o.a.c.h.Http11NioProtocol [main] Starting ProtocolHandler ["http-nio-8080"] 2024-01-11 03:34:05,744 INFO o.s.b.w.e.t.TomcatWebServer [main] Tomcat started on port 8080 (http) with context path '' 2024-01-11 03:34:05,787 INFO s.s.S.SPdfApplication [main] Started SPdfApplication in 9.559 seconds (process running for 11.344) 2024-01-11 03:34:10,393 INFO o.a.c.c.C.[.[.[/] [http-nio-8080-exec-1] Initializing Spring DispatcherServlet 'dispatcherServlet' 2024-01-11 03:34:10,397 INFO o.s.w.s.DispatcherServlet [http-nio-8080-exec-1] Initializing Servlet 'dispatcherServlet' 2024-01-11 03:34:10,400 INFO o.s.w.s.DispatcherServlet [http-nio-8080-exec-1] Completed initialization in 2 ms 2024-01-11 03:35:09,624 INFO s.s.S.u.ProcessExecutor [http-nio-8080-exec-3] Running command: ocrmypdf --verbose 2 --output-type pdf --pdf-renderer hocr --skip-text --language chi_sim+eng /tmp/input_5434599845335999770.pdf /tmp/output_17761550626375076767.pdf 2024-01-11 03:35:10,090 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf - ocrmypdf 15.4.4 2024-01-11 03:35:10,090 INFO s.s.S.u.ProcessExecutor [Thread-1] WARNING ocrmypdf._validation - The 'hocr' PDF renderer is known to cause problems with one or more of the languages in your document. Use --pdf-renderer auto (the default) to avoid this issue. 2024-01-11 03:35:10,090 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] 2024-01-11 03:35:10,106 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Found tesseract 5.3.2 2024-01-11 03:35:10,107 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] 2024-01-11 03:35:10,123 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] 2024-01-11 03:35:10,140 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Found gs 9.55.0 2024-01-11 03:35:10,142 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] 2024-01-11 03:35:10,158 INFO s.s.S.u.ProcessExecutor [Thread-1] WARNING ocrmypdf.builtin_plugins.ghostscript - The installed version of Ghostscript 9.55.0, contains a remote code execution security vulnerability. Please upgrade to a newer version. For details see CVE-2023-43115. The issue is not known to affect OCRmyPDF or processing PDFs with Ghostscript, but upgrading Ghostscript is recommended. 2024-01-11 03:35:10,159 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs'] 2024-01-11 03:35:10,184 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (3): 2024-01-11 03:35:10,186 INFO s.s.S.u.ProcessExecutor [Thread-1] chi_sim 2024-01-11 03:35:10,188 INFO s.s.S.u.ProcessExecutor [Thread-1] eng 2024-01-11 03:35:10,189 INFO s.s.S.u.ProcessExecutor [Thread-1] osd 2024-01-11 03:35:10,190 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:10,195 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,195 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_5434599845335999770.pdf, /tmp/ocrmypdf.io._ji5fyow/origin) 2024-01-11 03:35:10,195 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io._ji5fyow/origin, /tmp/ocrmypdf.io._ji5fyow/origin.pdf) 2024-01-11 03:35:10,196 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG root - Gathering info with 1 thread workers 2024-01-11 03:35:10,196 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,204 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:10,205 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 1 2024-01-11 03:35:10,206 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipelines.ocr - Start processing 2 pages concurrently 2024-01-11 03:35:10,208 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,209 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 1 Rasterize with png16m, rotation 0 2024-01-11 03:35:10,211 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - pikepdf mmap enabled 2024-01-11 03:35:10,212 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 2 Rasterize with png16m, rotation 0 2024-01-11 03:35:10,213 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 2 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=2', '-dLastPage=2', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:10,216 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 1 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:11,359 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IHDR' 16 13 2024-01-11 03:35:11,360 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'iCCP' 41 2354 2024-01-11 03:35:11,360 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:11,361 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 Compression method 0 2024-01-11 03:35:11,362 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'pHYs' 2407 9 2024-01-11 03:35:11,362 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'tEXt' 2428 31 2024-01-11 03:35:11,363 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:11,363 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._exec.ghostscript - 2 Rotating output by 0 2024-01-11 03:35:11,502 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IHDR' 16 13 2024-01-11 03:35:11,502 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'iCCP' 41 2354 2024-01-11 03:35:11,503 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:11,503 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 Compression method 0 2024-01-11 03:35:11,503 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'pHYs' 2407 9 2024-01-11 03:35:11,504 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'tEXt' 2428 31 2024-01-11 03:35:11,505 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:12,145 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IHDR' 16 13 2024-01-11 03:35:12,146 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'iCCP' 41 2350 2024-01-11 03:35:12,146 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 iCCP profile name b'ICC Profile' 2024-01-11 03:35:12,147 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 Compression method 0 2024-01-11 03:35:12,152 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'pHYs' 2403 9 2024-01-11 03:35:12,152 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 2 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:12,152 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 2 resolution (293.8526, 293.8526) 2024-01-11 03:35:12,380 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IHDR' 16 13 2024-01-11 03:35:12,381 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'iCCP' 41 2350 2024-01-11 03:35:12,381 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 iCCP profile name b'ICC Profile' 2024-01-11 03:35:12,381 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 Compression method 0 2024-01-11 03:35:12,383 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'pHYs' 2403 9 2024-01-11 03:35:12,383 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 1 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:12,384 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 1 resolution (293.8526, 293.8526) 2024-01-11 03:35:12,967 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000002_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000002_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:13,276 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 1 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000001_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000001_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:27,326 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 3 Rasterize with png16m, rotation 0 2024-01-11 03:35:27,328 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 3 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=3', '-dLastPage=3', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:27,334 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 2 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:27,334 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 2 Grafting 2024-01-11 03:35:27,340 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 2 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:28,594 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IHDR' 16 13 2024-01-11 03:35:28,595 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'iCCP' 41 2354 2024-01-11 03:35:28,596 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:28,598 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 Compression method 0 2024-01-11 03:35:28,599 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'pHYs' 2407 9 2024-01-11 03:35:28,600 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'tEXt' 2428 31 2024-01-11 03:35:28,601 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:29,547 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IHDR' 16 13 2024-01-11 03:35:29,547 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'iCCP' 41 2350 2024-01-11 03:35:29,548 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 iCCP profile name b'ICC Profile' 2024-01-11 03:35:29,550 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 Compression method 0 2024-01-11 03:35:29,550 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'pHYs' 2403 9 2024-01-11 03:35:29,552 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 3 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:29,553 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 3 resolution (293.8526, 293.8526) 2024-01-11 03:35:30,452 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 3 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000003_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000003_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:36,731 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 4 Rasterize with png16m, rotation 0 2024-01-11 03:35:36,731 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 4 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=4', '-dLastPage=4', '-r293.844399x293.844399', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io._ji5fyow/origin.pdf'] 2024-01-11 03:35:36,732 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:36,732 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 1 Grafting 2024-01-11 03:35:36,734 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 1 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:37,796 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IHDR' 16 13 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'iCCP' 41 2354 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 iCCP profile name b'default_rgb.icc' 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 Compression method 0 2024-01-11 03:35:37,797 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'pHYs' 2407 9 2024-01-11 03:35:37,798 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'tEXt' 2428 31 2024-01-11 03:35:37,798 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IDAT' 2471 8192 2024-01-11 03:35:38,496 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IHDR' 16 13 2024-01-11 03:35:38,496 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'iCCP' 41 2350 2024-01-11 03:35:38,496 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 iCCP profile name b'ICC Profile' 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 Compression method 0 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'pHYs' 2403 9 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG PIL.PngImagePlugin - 4 STREAM b'IDAT' 2424 65536 2024-01-11 03:35:38,497 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - 4 resolution (293.8526, 293.8526) 2024-01-11 03:35:39,185 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - 4 Running: ['tesseract', '-l', 'chi_sim+eng', '/tmp/ocrmypdf.io._ji5fyow/000004_ocr.png', '/tmp/ocrmypdf.io._ji5fyow/000004_ocr_hocr', 'hocr', 'txt'] 2024-01-11 03:35:39,646 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 3 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:39,647 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 3 Grafting 2024-01-11 03:35:39,647 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 3 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:45,840 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 4 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 2024-01-11 03:35:45,840 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 4 Grafting 2024-01-11 03:35:45,842 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._graft - 4 Page rotation: (content, auto) -> page = (0, 0) -> 0 2024-01-11 03:35:45,844 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:45,860 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipelines.ocr - Postprocessing... 2024-01-11 03:35:45,863 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] 2024-01-11 03:35:45,908 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-UHRZQ9HQ5X533U0JuqpMnw in page 0 2024-01-11 03:35:45,909 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: treating as an optimization candidate 2024-01-11 03:35:45,910 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-oCuAfBVpeysrcI5E9oCg0Q in page 1 2024-01-11 03:35:45,911 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: treating as an optimization candidate 2024-01-11 03:35:45,912 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-BOcqWvWNgLbYwba-X5JEYA in page 2 2024-01-11 03:35:45,913 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: treating as an optimization candidate 2024-01-11 03:35:45,915 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-vOoHdTKk2akf1cKO1c0suA in page 3 2024-01-11 03:35:45,916 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: treating as an optimization candidate 2024-01-11 03:35:46,569 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=25, ext='.png') 2024-01-11 03:35:47,331 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=27, ext='.png') 2024-01-11 03:35:47,896 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=29, ext='.png') 2024-01-11 03:35:48,631 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - XrefExt(xref=23, ext='.png') 2024-01-11 03:35:48,632 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 4 2024-01-11 03:35:48,633 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:48,634 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-UHRZQ9HQ5X533U0JuqpMnw in page 0 2024-01-11 03:35:48,634 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: treating as an optimization candidate 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-oCuAfBVpeysrcI5E9oCg0Q in page 1 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: treating as an optimization candidate 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-BOcqWvWNgLbYwba-X5JEYA in page 2 2024-01-11 03:35:48,635 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: treating as an optimization candidate 2024-01-11 03:35:48,636 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-vOoHdTKk2akf1cKO1c0suA in page 3 2024-01-11 03:35:48,636 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: treating as an optimization candidate 2024-01-11 03:35:48,637 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: marking this JPEG as deflatable 2024-01-11 03:35:48,637 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: marking this JPEG as deflatable 2024-01-11 03:35:48,638 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: marking this JPEG as deflatable 2024-01-11 03:35:48,639 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: marking this JPEG as deflatable 2024-01-11 03:35:48,670 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:48,670 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-UHRZQ9HQ5X533U0JuqpMnw in page 0 2024-01-11 03:35:48,672 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: treating as an optimization candidate 2024-01-11 03:35:48,672 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-oCuAfBVpeysrcI5E9oCg0Q in page 1 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: treating as an optimization candidate 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-BOcqWvWNgLbYwba-X5JEYA in page 2 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: treating as an optimization candidate 2024-01-11 03:35:48,673 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-vOoHdTKk2akf1cKO1c0suA in page 3 2024-01-11 03:35:48,674 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: treating as an optimization candidate 2024-01-11 03:35:48,674 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 25: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,675 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 27: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,677 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 29: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,678 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - xref 23: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization 2024-01-11 03:35:48,679 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0 2024-01-11 03:35:48,680 INFO s.s.S.u.ProcessExecutor [Thread-1] 2024-01-11 03:35:48,724 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io._ji5fyow/optimize.opt.pdf, /tmp/ocrmypdf.io._ji5fyow/optimize.pdf) 2024-01-11 03:35:48,725 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version'] 2024-01-11 03:35:48,726 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version'] 2024-01-11 03:35:48,727 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipeline - Image optimization ratio: 1.54 savings: 35.1% 2024-01-11 03:35:48,728 INFO s.s.S.u.ProcessExecutor [Thread-1] INFO ocrmypdf._pipeline - Total file size ratio: 1.52 savings: 34.2% 2024-01-11 03:35:48,730 INFO s.s.S.u.ProcessExecutor [Thread-1] DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io._ji5fyow/optimize.pdf -> /tmp/output_17761550626375076767.pdf

mikevshu avatar Jan 11 '24 04:01 mikevshu

Have you installed the language packs? https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToUseOCR.md

sbplat avatar Jan 11 '24 05:01 sbplat

Have you installed the language packs? https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToUseOCR.md

The problem description is that the OCR recognition result displays Chinese characters normally, but in the output PDF, the Chinese characters are all displayed as "n".

The Chinese language packs has been installed, otherwise the recognition result would not be displayed normally.

CrazyBunQnQ avatar Jan 11 '24 05:01 CrazyBunQnQ

Have you installed the language packs?是否安装了语言包? https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToUseOCR.md

yes
image

mikevshu avatar Jan 11 '24 06:01 mikevshu

Have you installed the language packs? https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToUseOCR.md

The problem description is that the OCR recognition result displays Chinese characters normally, but in the output PDF, the Chinese characters are all displayed as "n".

The Chinese language packs has been installed, otherwise the recognition result would not be displayed normally.

Yes, if the language pack is not installed, only English will be displayed under "Select languages that are to be detected within the PDF (Ones listed are the ones currently detected):".

mikevshu avatar Jan 11 '24 06:01 mikevshu

I also encountered this problem. The output text file is normal Chinese text, but the Chinese text copied in the PDF is all n. However, selecting "Sandwich" in the rendering type can recognize the Chinese characters in the output, but it may not be accurate enough.

I don't really understand the difference between these two rendering modes, but the accuracy of the recognized text should be the same, right? They are both results recognized by 'tesseract-ocr'.

And it's work for me! Thanks a lot!

CrazyBunQnQ avatar Jan 11 '24 07:01 CrazyBunQnQ

I also encountered this problem. The output text file is normal Chinese text, but the Chinese text copied in the PDF is all n. However, selecting "Sandwich" in the rendering type can recognize the Chinese characters in the output, but it may not be accurate enough.

I don't really understand the difference between these two rendering modes, but the accuracy of the recognized text should be the same, right? They are both results recognized by 'tesseract-ocr'.

And it's work for me! Thanks a lot!

We've solved the case! HOCR (Latin/Roman alphabet only) only supports Latin or Roman characters and does not support Chinese. This post can be deleted. The conclusion was derived from ChatGPT.

mikevshu avatar Jan 11 '24 07:01 mikevshu