Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

OCR Error, No such file or directory: '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr_hocr.hocr

Open StoiaCode opened this issue 8 months ago • 4 comments

I feel like im missing something obvious, but I cant quite pin it down. No matter what language or what PDF I am using, I get the following error.

java.io.IOException: Command process failed with exit code 15. Error message:   DEBUG ocrmypdf - ocrmypdf 16.1.1
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Found gs 10.2.1
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
  DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.261972
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (41):
deu
deu_frak
deu_latf
eng
script/Arabic
script/Armenian
script/Bengali
script/Canadian_Aboriginal
script/Cherokee
script/Cyrillic
script/Devanagari
script/Ethiopic
script/Fraktur
script/Georgian
script/Greek
script/Gujarati
script/Gurmukhi
script/HanS
script/HanS_vert
script/HanT
script/HanT_vert
script/Hangul
script/Hangul_vert
script/Hebrew
script/Japanese
script/Japanese_vert
script/Kannada
script/Khmer
script/Lao
script/Latin
script/Malayalam
script/Myanmar
script/Oriya
script/Sinhala
script/Syriac
script/Tamil
script/Telugu
script/Thaana
script/Thai
script/Tibetan
script/Vietnamese

  DEBUG ocrmypdf.helpers - pikepdf mmap enabled
  DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_13094943151699031899.pdf, /tmp/ocrmypdf.io.fez4ih5m/origin)
  DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.fez4ih5m/origin, /tmp/ocrmypdf.io.fez4ih5m/origin.pdf)
  DEBUG root - Gathering info with 1 thread workers
  DEBUG ocrmypdf.helpers - pikepdf mmap enabled

  DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
  DEBUG ocrmypdf.helpers - pikepdf mmap enabled
  DEBUG ocrmypdf._pipeline -    1  Rasterize with pngmono, rotation 0
  DEBUG ocrmypdf.subprocess -    1  Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pngmono', '-dFirstPage=1', '-dLastPage=1', '-r200.183607x200.183607', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io.fez4ih5m/origin.pdf']
  DEBUG PIL.PngImagePlugin -    1  STREAM b'IHDR' 16 13
  DEBUG PIL.PngImagePlugin -    1  STREAM b'iCCP' 41 2296
  DEBUG PIL.PngImagePlugin -    1  iCCP profile name b'default_gray.icc'
  DEBUG PIL.PngImagePlugin -    1  Compression method 0
  DEBUG PIL.PngImagePlugin -    1  STREAM b'pHYs' 2349 9
  DEBUG PIL.PngImagePlugin -    1  STREAM b'tEXt' 2370 32
  DEBUG PIL.PngImagePlugin -    1  STREAM b'IDAT' 2414 8192
  DEBUG ocrmypdf._exec.ghostscript -    1  Rotating output by 0
  DEBUG PIL.PngImagePlugin -    1  STREAM b'IHDR' 16 13
  DEBUG PIL.PngImagePlugin -    1  STREAM b'iCCP' 41 2291
  DEBUG PIL.PngImagePlugin -    1  iCCP profile name b'ICC Profile'
  DEBUG PIL.PngImagePlugin -    1  Compression method 0
  DEBUG PIL.PngImagePlugin -    1  STREAM b'pHYs' 2344 9
  DEBUG PIL.PngImagePlugin -    1  STREAM b'IDAT' 2365 30609
  DEBUG ocrmypdf._pipeline -    1  resolution (200.1774, 200.1774)
  DEBUG ocrmypdf.subprocess -    1  Running: ['tesseract', '-l', 'eng', '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr.png', '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr_hocr', 'hocr', 'txt']
   INFO ocrmypdf._exec.tesseract -    1  [tesseract] [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
   INFO ocrmypdf._exec.tesseract -    1  [tesseract] [DS] Device[1] 0:(null) score is 0.261972
   INFO ocrmypdf._exec.tesseract -    1  [tesseract] [DS] Selected Device[1]: "(null)" (Native)
  ERROR ocrmypdf._exec.tesseract -    1  [tesseract] read_params_file: Can't open hocr
  ERROR ocrmypdf._exec.tesseract -    1  [tesseract] read_params_file: Can't open txt

  ERROR ocrmypdf._pipelines._common - An exception occurred while executing the pipeline
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
    return fn(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 191, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 118, in exec_concurrent
    executor(
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_concurrent.py", line 78, in __call__
    self._execute(
  File "/usr/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/concurrency.py", line 144, in _execute
    result = future.result()
             ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 82, in _exec_page_sync
    ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 64, in _image_to_ocr_text
    ocr_out = render_hocr_page(hocr_out, page_context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 750, in render_hocr_page
    if hocr.stat().st_size == 0:
       ^^^^^^^^^^^
  File "/usr/lib/python3.12/pathlib.py", line 840, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.fez4ih5m/000001_ocr_hocr.hocr'
	at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:190)
	at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:85)
	at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:148)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255)
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:925)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:830)
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
	at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
	at org.eclipse.jetty.ee10.websocket.servlet.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:195)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:62)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:814)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
	at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:571)
	at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:703)
	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:765)
	at org.eclipse.jetty.server.Server.handle(Server.java:179)
	at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
	at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:410)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156)
	at java.base/java.lang.Thread.run(Thread.java:1583)

StoiaCode avatar Jun 03 '24 09:06 StoiaCode