Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

OCR leads to Internal Server Error:Command process failed with exit code 8

Open ovizii opened this issue 1 year ago • 10 comments

I was helped getting OCR working here: https://github.com/Stirling-Tools/Stirling-PDF/issues/821

It partially works, meaning now I can also see the downloaded German language for selection for OCR, but a simple test gives me an error with a stacktrace (same goes for English).

Any idea what could be wrong? Any other info I can submit?

image

java.io.IOException: Command process failed with exit code 8
	at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:192)
	at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:82)
	at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:151)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:261)
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:189)
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:917)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:829)
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at stirling.software.SPDF.config.security.IPRateLimitingFilter.doFilter(IPRateLimitingFilter.java:61)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:62)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at stirling.software.SPDF.config.security.UserBasedRateLimitingFilter.doFilterInternal(UserBasedRateLimitingFilter.java:48)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.springframework.web.filter.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:108)
	at org.springframework.security.web.FilterChainProxy.lambda$doFilterInternal$3(FilterChainProxy.java:231)
	at org.springframework.security.web.ObservationFilterChainDecorator$FilterObservation$SimpleFilterObservation.lambda$wrap$1(ObservationFilterChainDecorator.java:479)
	at org.springframework.security.web.ObservationFilterChainDecorator$AroundFilterObservation$SimpleAroundFilterObservation.lambda$wrap$1(ObservationFilterChainDecorator.java:340)
	at org.springframework.security.web.ObservationFilterChainDecorator.lambda$wrapSecured$0(ObservationFilterChainDecorator.java:82)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:128)
	at org.springframework.security.web.access.intercept.AuthorizationFilter.doFilter(AuthorizationFilter.java:100)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:126)
	at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:120)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:100)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:110)
	at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:101)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:179)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at stirling.software.SPDF.config.security.FirstLoginFilter.doFilterInternal(FirstLoginFilter.java:51)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:227)
	at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:221)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at stirling.software.SPDF.config.security.IPRateLimitingFilter.doFilter(IPRateLimitingFilter.java:61)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at stirling.software.SPDF.config.security.UserAuthenticationFilter.doFilterInternal(UserAuthenticationFilter.java:90)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:107)
	at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:93)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.web.filter.CorsFilter.doFilterInternal(CorsFilter.java:91)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:90)
	at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:75)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.context.SecurityContextHolderFilter.doFilter(SecurityContextHolderFilter.java:82)
	at org.springframework.security.web.context.SecurityContextHolderFilter.doFilter(SecurityContextHolderFilter.java:69)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:62)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.session.DisableEncodeUrlFilter.doFilterInternal(DisableEncodeUrlFilter.java:42)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
	at org.springframework.security.web.ObservationFilterChainDecorator$AroundFilterObservation$SimpleAroundFilterObservation.lambda$wrap$0(ObservationFilterChainDecorator.java:323)
	at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:224)
	at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
	at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:233)
	at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:191)
	at org.springframework.web.filter.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:113)
	at org.springframework.web.servlet.handler.HandlerMappingIntrospector.lambda$createCacheFilter$3(HandlerMappingIntrospector.java:195)
	at org.springframework.web.filter.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:113)
	at org.springframework.web.filter.CompositeFilter.doFilter(CompositeFilter.java:74)
	at org.springframework.security.config.annotation.web.configuration.WebMvcSecurityConfiguration$CompositeFilterChainProxy.doFilter(WebMvcSecurityConfiguration.java:225)
	at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:352)
	at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:268)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
	at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:735)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:340)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
	at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
	at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.base/java.lang.Thread.run(Thread.java:840)

ovizii avatar Feb 17 '24 22:02 ovizii

There might be more logs in the docker logs which will help on this issue

Frooodle avatar Feb 17 '24 23:02 Frooodle

Yes, I have exported the logs to a file but its 500 KB in size. Let me know if the logs below are not enough to help.

Here is the docker-compose.yml I use:

services:

# https://github.com/Frooodle/Stirling-PDF

  stirling-pdf:
    image: frooodle/s-pdf:latest
    hostname: stirling-pdf
    container_name: stirling-pdf
    restart: "no"
    volumes:
#      - ./trainingData:/usr/share/tessdata #Required for extra OCR languages
      - ./trainingData:/usr/share/tesseract-ocr/5/tessdata #Required for extra OCR languages
      - ./extraConfigs:/configs
#     - /location/of/customFiles:/customFiles/
    environment:
      - DOCKER_ENABLE_SECURITY=true
      - APP_LOCALE=en_GB
      - APP_HOME_NAME=PDF Toolbox
      - APP_HOME_DESCRIPTION=""
      - APP_NAVBAR_NAME=PDF Toolbox
      - APP_ROOT_PATH=/
    networks:
      stirling-pdf:
        ipv4_address: 192.168.192.94
      traefik_stirling-pdf:
#    mac_address: replaceme
    cpus: 1
    mem_limit: 2G
    labels:
      # enable Traefik
      - traefik.enable=true
      # Network
      - traefik.docker.network=traefik_stirling-pdf
      # Router
      - traefik.http.routers.pdftools.tls=true
      - traefik.http.routers.pdftools.entrypoints=websecure
      - traefik.http.routers.pdftools.rule=Host(`pdf-tools.domain.tld`)
#      - traefik.http.routers.pdftools.middlewares=secHeaders@file,localIPsOnly@file,authentik@docker
      - traefik.http.routers.pdftools.middlewares=secHeaders@file
      - traefik.http.routers.pdftools.service=pdftools
      # Service
      - traefik.http.services.pdftools.loadbalancer.server.port=8080
      - traefik.http.services.pdftools.loadbalancer.server.scheme=http


networks:

  stirling-pdf:
    name: stirling-pdf
    driver: macvlan
    ipam:
      config:
        - subnet: 192.168.192.92/30
          gateway: 192.168.192.93
    driver_opts:
      parent: vmbr1.1021
    external: false

  traefik_stirling-pdf:
    external: true
    internal: true
    name: traefik_stirling-pdf

Here is the content of the folder mounted as trainingData

ls -al
total 168
drwxrwx---  4 1000 1000      7 Feb 18 17:41 .
drwxrwx--- 40 1000 root     41 Feb  6 21:11 ..
-rwxrwx---  1 1000 1000   1831 Feb 17 23:19 docker-compose.yml
drwxrwx---  2 1000 1000      5 Feb 17 19:19 extraConfigs
drwxrwx---  4 1000 1000      8 Feb 17 19:28 trainingData

ls -al trainingData/
total 10400
drwxrwx--- 4 1000 1000        8 Feb 17 19:28 .
drwxrwx--- 4 1000 1000        7 Feb 18 17:41 ..
drwxrwx--- 2 1000 1000       27 Sep 14 17:01 configs
-rwxrwx--- 1 1000 1000    14089 Feb 17 19:28 deu.traineddata
-rwxrwx--- 1 1000 1000  4113088 Sep 14 17:01 eng.traineddata
-rwxrwx--- 1 1000 1000 10562727 Sep 14 17:01 osd.traineddata
-rwxrwx--- 1 1000 1000      572 Sep 14 17:01 pdf.ttf
drwxrwx--- 2 1000 1000        8 Sep 14 17:01 tessconfigs

Some of the related lines which I noticed. If you need more, I can try uploading the full log file.

stirling-pdf  | 22:20:28.675 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf._pipeline -    1  resolution (599.7701999999999, 599.7701999999999)
stirling-pdf  | 22:20:28.851 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess -    2  Running: ['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr.png', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr_hocr', 'hocr', 'txt']
stirling-pdf  | 22:20:28.861 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   ERROR ocrmypdf._exec.tesseract -    2  [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata
stirling-pdf  | 22:20:28.862 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
stirling-pdf  | 22:20:28.862 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Failed loading language 'deu'
stirling-pdf  | 22:20:28.862 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Tesseract couldn't load any languages!
stirling-pdf  | 22:20:28.862 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Could not initialize tesseract.
stirling-pdf  | 22:20:28.910 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess -    1  Running: ['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.x5gxaeno/000001_ocr.png', '/tmp/ocrmypdf.io.x5gxaeno/000001_ocr_hocr', 'hocr', 'txt']
stirling-pdf  | 22:20:28.942 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   ERROR ocrmypdf._exec.tesseract -    1  [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata
stirling-pdf  | 22:20:28.943 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    1  [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
stirling-pdf  | 22:20:28.943 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    1  [tesseract] Failed loading language 'deu'
stirling-pdf  | 22:20:28.943 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    1  [tesseract] Tesseract couldn't load any languages!
stirling-pdf  | 22:20:28.943 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    1  [tesseract] Could not initialize tesseract.
stirling-pdf  | 22:20:28.943 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   ERROR ocrmypdf._pipelines._common - ExitCodeException
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor - Traceback (most recent call last):
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_exec/tesseract.py", line 312, in generate_hocr
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     p = run(args_tesseract, stdout=PIPE, stderr=STDOUT, timeout=timeout, check=True)
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/subprocess/__init__.py", line 63, in run
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     proc = subprocess_run(args, env=env, check=check, **kwargs)
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/lib/python3.10/subprocess.py", line 526, in run
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     raise CalledProcessError(retcode, process.args,
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor - subprocess.CalledProcessError: Command '['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr.png', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr_hocr', 'hocr', 'txt']' returned non-zero exit status 1.
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor - The above exception was the direct cause of the following exception:
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor - Traceback (most recent call last):
stirling-pdf  | 22:20:28.945 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     return fn(options, plugin_manager)
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 192, in _run_pipeline
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     optimize_messages = exec_concurrent(context, executor)
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 119, in exec_concurrent
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     executor(
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_concurrent.py", line 74, in __call__
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     self._execute(
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 141, in _execute
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     result = future.result()
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     return self.__get_result()
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     raise self._exception
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     result = self.fn(*self.args, **self.kwargs)
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 82, in _exec_page_sync
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 63, in _image_to_ocr_text
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     hocr_out, text_out = ocr_engine_hocr(ocr_image_out, page_context)
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipeline.py", line 652, in ocr_engine_hocr
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     ocr_engine.generate_hocr(
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/builtin_plugins/tesseract_ocr.py", line 248, in generate_hocr
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     tesseract.generate_hocr(
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_exec/tesseract.py", line 326, in generate_hocr
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -     raise SubprocessOutputError() from e
stirling-pdf  | 22:20:28.946 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor - ocrmypdf.exceptions.SubprocessOutputError
stirling-pdf  | 22:20:28.999 [http-nio-8080-exec-3] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception
stirling-pdf  | java.io.IOException: Command process failed with exit code 7
stirling-pdf  |         at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:192)
stirling-pdf  |         at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:82)
stirling-pdf  |         at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:151)

ovizii avatar Feb 18 '24 16:02 ovizii

ERROR ocrmypdf._exec.tesseract - 1 [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Failed loading language 'deu' stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract]

Are the permissions on the files setup all fine

Frooodle avatar Feb 18 '24 17:02 Frooodle

Are the permissions on the files setup all fine

That is a good question. From my point of view, yes as they are owned by my user and group: 1000:1000.

The question is, what is the user inside this container running as?

A quick docker exec -ti stirling-pdf bash followed by ps aux shows me everything is running as root.

I tried chown -R root:root stirling-pdf/ followed by

stirling-pdf# ls -al
total 124
drwxrwx---  4 root root      7 Feb 18 19:11 .
drwxrwx--- 39 1000 root     40 Feb 18 18:38 ..
-rwxrwx---  1 root root   1831 Feb 17 23:19 docker-compose.yml
-rwxrwx---  1 root root 445242 Feb 18 17:33 docker-logs-English.txt
-rwxrwx---  1 root root  51136 Feb 18 19:10 docker-logs-German.txt
drwxrwx---  2 root root      5 Feb 17 19:19 extraConfigs
drwxrwx---  4 root root      8 Feb 17 19:28 trainingData

I then started the container from scratch. The problem remains the same. Any other ideas?

stirling-pdf  | 18:10:36.402 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess -    2  Running: ['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.br4nhepu/000002_ocr.png', '/tmp/ocrmypdf.io.br4nhepu/000002_ocr_hocr', 'hocr', 'txt']
stirling-pdf  | 18:10:36.413 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   ERROR ocrmypdf._exec.tesseract -    2  [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata
stirling-pdf  | 18:10:36.413 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
stirling-pdf  | 18:10:36.413 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Failed loading language 'deu'
stirling-pdf  | 18:10:36.413 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Tesseract couldn't load any languages!
stirling-pdf  | 18:10:36.413 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -    INFO ocrmypdf._exec.tesseract -    2  [tesseract] Could not initialize tesseract.
stirling-pdf  | 18:10:36.504 [Thread-0] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf._pipeline -    1  convert

ovizii avatar Feb 18 '24 18:02 ovizii

Right now Stirling-PDF has not been tested with none root user so cant say, its in plans for next release

Frooodle avatar Feb 18 '24 23:02 Frooodle

I understand, but I did a chown -R root:root stirling-pdf/ and started the container as root. All other functions of it work, except for this OCR error.

ovizii avatar Feb 18 '24 23:02 ovizii

right sorry misunderstood, not sure cause of this can you try update to latest Stirling-PDF and try again?

Frooodle avatar Feb 18 '24 23:02 Frooodle

No worries. Anyway, I can live the way it works, I don't really need OCR, I was just wanting to test how well it works.

I just did a docker compose pull and the problem still persists. Feel free to keep or close the ticket.

ovizii avatar Feb 18 '24 23:02 ovizii