Stirling-PDF
Stirling-PDF copied to clipboard
OCR Error, No such file or directory: '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr_hocr.hocr
I feel like im missing something obvious, but I cant quite pin it down. No matter what language or what PDF I am using, I get the following error.
java.io.IOException: Command process failed with exit code 15. Error message: DEBUG ocrmypdf - ocrmypdf 16.1.1
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Found gs 10.2.1
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.261972
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (41):
deu
deu_frak
deu_latf
eng
script/Arabic
script/Armenian
script/Bengali
script/Canadian_Aboriginal
script/Cherokee
script/Cyrillic
script/Devanagari
script/Ethiopic
script/Fraktur
script/Georgian
script/Greek
script/Gujarati
script/Gurmukhi
script/HanS
script/HanS_vert
script/HanT
script/HanT_vert
script/Hangul
script/Hangul_vert
script/Hebrew
script/Japanese
script/Japanese_vert
script/Kannada
script/Khmer
script/Lao
script/Latin
script/Malayalam
script/Myanmar
script/Oriya
script/Sinhala
script/Syriac
script/Tamil
script/Telugu
script/Thaana
script/Thai
script/Tibetan
script/Vietnamese
DEBUG ocrmypdf.helpers - pikepdf mmap enabled
DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_13094943151699031899.pdf, /tmp/ocrmypdf.io.fez4ih5m/origin)
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.fez4ih5m/origin, /tmp/ocrmypdf.io.fez4ih5m/origin.pdf)
DEBUG root - Gathering info with 1 thread workers
DEBUG ocrmypdf.helpers - pikepdf mmap enabled
DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
DEBUG ocrmypdf.helpers - pikepdf mmap enabled
DEBUG ocrmypdf._pipeline - 1 Rasterize with pngmono, rotation 0
DEBUG ocrmypdf.subprocess - 1 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pngmono', '-dFirstPage=1', '-dLastPage=1', '-r200.183607x200.183607', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io.fez4ih5m/origin.pdf']
DEBUG PIL.PngImagePlugin - 1 STREAM b'IHDR' 16 13
DEBUG PIL.PngImagePlugin - 1 STREAM b'iCCP' 41 2296
DEBUG PIL.PngImagePlugin - 1 iCCP profile name b'default_gray.icc'
DEBUG PIL.PngImagePlugin - 1 Compression method 0
DEBUG PIL.PngImagePlugin - 1 STREAM b'pHYs' 2349 9
DEBUG PIL.PngImagePlugin - 1 STREAM b'tEXt' 2370 32
DEBUG PIL.PngImagePlugin - 1 STREAM b'IDAT' 2414 8192
DEBUG ocrmypdf._exec.ghostscript - 1 Rotating output by 0
DEBUG PIL.PngImagePlugin - 1 STREAM b'IHDR' 16 13
DEBUG PIL.PngImagePlugin - 1 STREAM b'iCCP' 41 2291
DEBUG PIL.PngImagePlugin - 1 iCCP profile name b'ICC Profile'
DEBUG PIL.PngImagePlugin - 1 Compression method 0
DEBUG PIL.PngImagePlugin - 1 STREAM b'pHYs' 2344 9
DEBUG PIL.PngImagePlugin - 1 STREAM b'IDAT' 2365 30609
DEBUG ocrmypdf._pipeline - 1 resolution (200.1774, 200.1774)
DEBUG ocrmypdf.subprocess - 1 Running: ['tesseract', '-l', 'eng', '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr.png', '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr_hocr', 'hocr', 'txt']
INFO ocrmypdf._exec.tesseract - 1 [tesseract] [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
INFO ocrmypdf._exec.tesseract - 1 [tesseract] [DS] Device[1] 0:(null) score is 0.261972
INFO ocrmypdf._exec.tesseract - 1 [tesseract] [DS] Selected Device[1]: "(null)" (Native)
ERROR ocrmypdf._exec.tesseract - 1 [tesseract] read_params_file: Can't open hocr
ERROR ocrmypdf._exec.tesseract - 1 [tesseract] read_params_file: Can't open txt
ERROR ocrmypdf._pipelines._common - An exception occurred while executing the pipeline
Traceback (most recent call last):
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
return fn(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 191, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 118, in exec_concurrent
executor(
File "/usr/lib/python3.12/site-packages/ocrmypdf/_concurrent.py", line 78, in __call__
self._execute(
File "/usr/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/concurrency.py", line 144, in _execute
result = future.result()
^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 82, in _exec_page_sync
ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 64, in _image_to_ocr_text
ocr_out = render_hocr_page(hocr_out, page_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 750, in render_hocr_page
if hocr.stat().st_size == 0:
^^^^^^^^^^^
File "/usr/lib/python3.12/pathlib.py", line 840, in stat
return os.stat(self, follow_symlinks=follow_symlinks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr_hocr.hocr'
at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:190)
at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:85)
at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:148)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:925)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:830)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
at org.eclipse.jetty.ee10.websocket.servlet.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:195)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:62)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:814)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:571)
at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:703)
at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:765)
at org.eclipse.jetty.server.Server.handle(Server.java:179)
at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:410)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156)
at java.base/java.lang.Thread.run(Thread.java:1583)