OCR leads to Internal Server Error:Command process failed with exit code 8
I was helped getting OCR working here: https://github.com/Stirling-Tools/Stirling-PDF/issues/821
It partially works, meaning now I can also see the downloaded German language for selection for OCR, but a simple test gives me an error with a stacktrace (same goes for English).
Any idea what could be wrong? Any other info I can submit?
java.io.IOException: Command process failed with exit code 8
at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:192)
at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:82)
at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:151)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:261)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:189)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:917)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:829)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at stirling.software.SPDF.config.security.IPRateLimitingFilter.doFilter(IPRateLimitingFilter.java:61)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:62)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:110)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at stirling.software.SPDF.config.security.UserBasedRateLimitingFilter.doFilterInternal(UserBasedRateLimitingFilter.java:48)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.springframework.web.filter.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:108)
at org.springframework.security.web.FilterChainProxy.lambda$doFilterInternal$3(FilterChainProxy.java:231)
at org.springframework.security.web.ObservationFilterChainDecorator$FilterObservation$SimpleFilterObservation.lambda$wrap$1(ObservationFilterChainDecorator.java:479)
at org.springframework.security.web.ObservationFilterChainDecorator$AroundFilterObservation$SimpleAroundFilterObservation.lambda$wrap$1(ObservationFilterChainDecorator.java:340)
at org.springframework.security.web.ObservationFilterChainDecorator.lambda$wrapSecured$0(ObservationFilterChainDecorator.java:82)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:128)
at org.springframework.security.web.access.intercept.AuthorizationFilter.doFilter(AuthorizationFilter.java:100)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:126)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:120)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:100)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:110)
at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:101)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:179)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at stirling.software.SPDF.config.security.FirstLoginFilter.doFilterInternal(FirstLoginFilter.java:51)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:227)
at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:221)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at stirling.software.SPDF.config.security.IPRateLimitingFilter.doFilter(IPRateLimitingFilter.java:61)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at stirling.software.SPDF.config.security.UserAuthenticationFilter.doFilterInternal(UserAuthenticationFilter.java:90)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:107)
at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:93)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.web.filter.CorsFilter.doFilterInternal(CorsFilter.java:91)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:90)
at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:75)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.context.SecurityContextHolderFilter.doFilter(SecurityContextHolderFilter.java:82)
at org.springframework.security.web.context.SecurityContextHolderFilter.doFilter(SecurityContextHolderFilter.java:69)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:62)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:227)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.session.DisableEncodeUrlFilter.doFilterInternal(DisableEncodeUrlFilter.java:42)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.wrapFilter(ObservationFilterChainDecorator.java:240)
at org.springframework.security.web.ObservationFilterChainDecorator$AroundFilterObservation$SimpleAroundFilterObservation.lambda$wrap$0(ObservationFilterChainDecorator.java:323)
at org.springframework.security.web.ObservationFilterChainDecorator$ObservationFilter.doFilter(ObservationFilterChainDecorator.java:224)
at org.springframework.security.web.ObservationFilterChainDecorator$VirtualFilterChain.doFilter(ObservationFilterChainDecorator.java:137)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:233)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:191)
at org.springframework.web.filter.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:113)
at org.springframework.web.servlet.handler.HandlerMappingIntrospector.lambda$createCacheFilter$3(HandlerMappingIntrospector.java:195)
at org.springframework.web.filter.CompositeFilter$VirtualFilterChain.doFilter(CompositeFilter.java:113)
at org.springframework.web.filter.CompositeFilter.doFilter(CompositeFilter.java:74)
at org.springframework.security.config.annotation.web.configuration.WebMvcSecurityConfiguration$CompositeFilterChainProxy.doFilter(WebMvcSecurityConfiguration.java:225)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:352)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:268)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:735)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:340)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:840)
There might be more logs in the docker logs which will help on this issue
Yes, I have exported the logs to a file but its 500 KB in size. Let me know if the logs below are not enough to help.
Here is the docker-compose.yml I use:
services:
# https://github.com/Frooodle/Stirling-PDF
stirling-pdf:
image: frooodle/s-pdf:latest
hostname: stirling-pdf
container_name: stirling-pdf
restart: "no"
volumes:
# - ./trainingData:/usr/share/tessdata #Required for extra OCR languages
- ./trainingData:/usr/share/tesseract-ocr/5/tessdata #Required for extra OCR languages
- ./extraConfigs:/configs
# - /location/of/customFiles:/customFiles/
environment:
- DOCKER_ENABLE_SECURITY=true
- APP_LOCALE=en_GB
- APP_HOME_NAME=PDF Toolbox
- APP_HOME_DESCRIPTION=""
- APP_NAVBAR_NAME=PDF Toolbox
- APP_ROOT_PATH=/
networks:
stirling-pdf:
ipv4_address: 192.168.192.94
traefik_stirling-pdf:
# mac_address: replaceme
cpus: 1
mem_limit: 2G
labels:
# enable Traefik
- traefik.enable=true
# Network
- traefik.docker.network=traefik_stirling-pdf
# Router
- traefik.http.routers.pdftools.tls=true
- traefik.http.routers.pdftools.entrypoints=websecure
- traefik.http.routers.pdftools.rule=Host(`pdf-tools.domain.tld`)
# - traefik.http.routers.pdftools.middlewares=secHeaders@file,localIPsOnly@file,authentik@docker
- traefik.http.routers.pdftools.middlewares=secHeaders@file
- traefik.http.routers.pdftools.service=pdftools
# Service
- traefik.http.services.pdftools.loadbalancer.server.port=8080
- traefik.http.services.pdftools.loadbalancer.server.scheme=http
networks:
stirling-pdf:
name: stirling-pdf
driver: macvlan
ipam:
config:
- subnet: 192.168.192.92/30
gateway: 192.168.192.93
driver_opts:
parent: vmbr1.1021
external: false
traefik_stirling-pdf:
external: true
internal: true
name: traefik_stirling-pdf
Here is the content of the folder mounted as trainingData
ls -al
total 168
drwxrwx--- 4 1000 1000 7 Feb 18 17:41 .
drwxrwx--- 40 1000 root 41 Feb 6 21:11 ..
-rwxrwx--- 1 1000 1000 1831 Feb 17 23:19 docker-compose.yml
drwxrwx--- 2 1000 1000 5 Feb 17 19:19 extraConfigs
drwxrwx--- 4 1000 1000 8 Feb 17 19:28 trainingData
ls -al trainingData/
total 10400
drwxrwx--- 4 1000 1000 8 Feb 17 19:28 .
drwxrwx--- 4 1000 1000 7 Feb 18 17:41 ..
drwxrwx--- 2 1000 1000 27 Sep 14 17:01 configs
-rwxrwx--- 1 1000 1000 14089 Feb 17 19:28 deu.traineddata
-rwxrwx--- 1 1000 1000 4113088 Sep 14 17:01 eng.traineddata
-rwxrwx--- 1 1000 1000 10562727 Sep 14 17:01 osd.traineddata
-rwxrwx--- 1 1000 1000 572 Sep 14 17:01 pdf.ttf
drwxrwx--- 2 1000 1000 8 Sep 14 17:01 tessconfigs
Some of the related lines which I noticed. If you need more, I can try uploading the full log file.
stirling-pdf | 22:20:28.675 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf._pipeline - 1 resolution (599.7701999999999, 599.7701999999999)
stirling-pdf | 22:20:28.851 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr.png', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr_hocr', 'hocr', 'txt']
stirling-pdf | 22:20:28.861 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - ERROR ocrmypdf._exec.tesseract - 2 [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata
stirling-pdf | 22:20:28.862 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
stirling-pdf | 22:20:28.862 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Failed loading language 'deu'
stirling-pdf | 22:20:28.862 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Tesseract couldn't load any languages!
stirling-pdf | 22:20:28.862 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Could not initialize tesseract.
stirling-pdf | 22:20:28.910 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - 1 Running: ['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.x5gxaeno/000001_ocr.png', '/tmp/ocrmypdf.io.x5gxaeno/000001_ocr_hocr', 'hocr', 'txt']
stirling-pdf | 22:20:28.942 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - ERROR ocrmypdf._exec.tesseract - 1 [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata
stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Failed loading language 'deu'
stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Tesseract couldn't load any languages!
stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Could not initialize tesseract.
stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor -
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - ERROR ocrmypdf._pipelines._common - ExitCodeException
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - Traceback (most recent call last):
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_exec/tesseract.py", line 312, in generate_hocr
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - p = run(args_tesseract, stdout=PIPE, stderr=STDOUT, timeout=timeout, check=True)
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/subprocess/__init__.py", line 63, in run
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - proc = subprocess_run(args, env=env, check=check, **kwargs)
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/lib/python3.10/subprocess.py", line 526, in run
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - raise CalledProcessError(retcode, process.args,
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - subprocess.CalledProcessError: Command '['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr.png', '/tmp/ocrmypdf.io.x5gxaeno/000002_ocr_hocr', 'hocr', 'txt']' returned non-zero exit status 1.
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor -
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - The above exception was the direct cause of the following exception:
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor -
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - Traceback (most recent call last):
stirling-pdf | 22:20:28.945 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - return fn(options, plugin_manager)
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 192, in _run_pipeline
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - optimize_messages = exec_concurrent(context, executor)
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 119, in exec_concurrent
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - executor(
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_concurrent.py", line 74, in __call__
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - self._execute(
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 141, in _execute
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - result = future.result()
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - return self.__get_result()
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - raise self._exception
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - result = self.fn(*self.args, **self.kwargs)
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 82, in _exec_page_sync
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipelines/ocr.py", line 63, in _image_to_ocr_text
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - hocr_out, text_out = ocr_engine_hocr(ocr_image_out, page_context)
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_pipeline.py", line 652, in ocr_engine_hocr
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - ocr_engine.generate_hocr(
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/builtin_plugins/tesseract_ocr.py", line 248, in generate_hocr
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - tesseract.generate_hocr(
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/local/lib/python3.10/dist-packages/ocrmypdf/_exec/tesseract.py", line 326, in generate_hocr
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - raise SubprocessOutputError() from e
stirling-pdf | 22:20:28.946 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - ocrmypdf.exceptions.SubprocessOutputError
stirling-pdf | 22:20:28.999 [http-nio-8080-exec-3] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception
stirling-pdf | java.io.IOException: Command process failed with exit code 7
stirling-pdf | at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:192)
stirling-pdf | at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:82)
stirling-pdf | at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:151)
ERROR ocrmypdf._exec.tesseract - 1 [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract] Failed loading language 'deu' stirling-pdf | 22:20:28.943 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 1 [tesseract]
Are the permissions on the files setup all fine
Are the permissions on the files setup all fine
That is a good question. From my point of view, yes as they are owned by my user and group: 1000:1000.
The question is, what is the user inside this container running as?
A quick docker exec -ti stirling-pdf bash followed by ps aux shows me everything is running as root.
I tried chown -R root:root stirling-pdf/ followed by
stirling-pdf# ls -al
total 124
drwxrwx--- 4 root root 7 Feb 18 19:11 .
drwxrwx--- 39 1000 root 40 Feb 18 18:38 ..
-rwxrwx--- 1 root root 1831 Feb 17 23:19 docker-compose.yml
-rwxrwx--- 1 root root 445242 Feb 18 17:33 docker-logs-English.txt
-rwxrwx--- 1 root root 51136 Feb 18 19:10 docker-logs-German.txt
drwxrwx--- 2 root root 5 Feb 17 19:19 extraConfigs
drwxrwx--- 4 root root 8 Feb 17 19:28 trainingData
I then started the container from scratch. The problem remains the same. Any other ideas?
stirling-pdf | 18:10:36.402 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - 2 Running: ['tesseract', '-l', 'deu', '/tmp/ocrmypdf.io.br4nhepu/000002_ocr.png', '/tmp/ocrmypdf.io.br4nhepu/000002_ocr_hocr', 'hocr', 'txt']
stirling-pdf | 18:10:36.413 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - ERROR ocrmypdf._exec.tesseract - 2 [tesseract] Error opening data file /usr/share/tesseract-ocr/5/tessdata/deu.traineddata
stirling-pdf | 18:10:36.413 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
stirling-pdf | 18:10:36.413 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Failed loading language 'deu'
stirling-pdf | 18:10:36.413 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Tesseract couldn't load any languages!
stirling-pdf | 18:10:36.413 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._exec.tesseract - 2 [tesseract] Could not initialize tesseract.
stirling-pdf | 18:10:36.504 [Thread-0] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf._pipeline - 1 convert
Right now Stirling-PDF has not been tested with none root user so cant say, its in plans for next release
I understand, but I did a chown -R root:root stirling-pdf/ and started the container as root. All other functions of it work, except for this OCR error.
right sorry misunderstood, not sure cause of this can you try update to latest Stirling-PDF and try again?
No worries. Anyway, I can live the way it works, I don't really need OCR, I was just wanting to test how well it works.
I just did a docker compose pull and the problem still persists. Feel free to keep or close the ticket.