Stirling-PDF
Stirling-PDF copied to clipboard
[Bug]: OCR remove signatures before proceeding
The Problem
I have a pdf test file which I attach here (test.pdf)
test.pdf
which I submitted to use Stirling-PDF Cleanup Scans/OCR function.
In the attached file option.png you find the options I adopted.
Version of Stirling-PDF
0.26.1 - Docker on a Synology device
Last Working Version of Stirling-PDF
None tested before
Page Where the Problem Occurred
http://192.168.1.50:8080/ocr-pdf?lang=en_US [just to see the internal URL I used]
Docker Configuration
No response
Relevant Log Output
ERROR
---------
Internal Server Error:java.io.IOException: Command process failed with exit code 2.Error message: DEBUG ocrmypdf - ocrmypdf 16.1.1 DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version'] DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0 DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4 DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] DEBUG ocrmypdf.subprocess - Found gs 10.2.1 DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs'] DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat). [DS] Device[1] 0:(null) score is 1.146822 [DS] Selected Device[1]: "(null)" (Native) List of available languages in "/usr/share/tessdata/" (3): eng ita lat DEBUG ocrmypdf.helpers - pikepdf mmap enabled DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_1371985821737702192.pdf, /tmp/ocrmypdf.io.c_b14rry/origin) DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.c_b14rry/origin, /tmp/ocrmypdf.io.c_b14rry/origin.pdf) DEBUG root - Gathering info with 1 thread workers DEBUG ocrmypdf.helpers - pikepdf mmap enabled ERROR ocrmypdf._pipelines._common - ExitCodeException Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler return fn(options, plugin_manager) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 188, in _run_pipeline validate_pdfinfo_options(context) File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 204, in validate_pdfinfo_options raise DigitalSignatureError() ocrmypdf.exceptions.DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document, invalidating the signature.
STACK TRACE
------------
java.io.IOException: Command process failed with exit code 2. Error message: DEBUG ocrmypdf - ocrmypdf 16.1.1
DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version']
DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Found gs 10.2.1
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 1.146822
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (3):
eng
ita
lat
DEBUG ocrmypdf.helpers - pikepdf mmap enabled
DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_1371985821737702192.pdf, /tmp/ocrmypdf.io.c_b14rry/origin)
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.c_b14rry/origin, /tmp/ocrmypdf.io.c_b14rry/origin.pdf)
DEBUG root - Gathering info with 1 thread workers
DEBUG ocrmypdf.helpers - pikepdf mmap enabled
ERROR ocrmypdf._pipelines._common - ExitCodeException
Traceback (most recent call last):
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
return fn(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 188, in _run_pipeline
validate_pdfinfo_options(context)
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 204, in validate_pdfinfo_options
raise DigitalSignatureError()
ocrmypdf.exceptions.DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document,
invalidating the signature.
at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:190)
at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:85)
at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:148)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:925)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:830)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
at org.eclipse.jetty.ee10.websocket.servlet.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:195)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:61)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:814)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:571)
at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:703)
at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:765)
at org.eclipse.jetty.server.Server.handle(Server.java:179)
at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156)
at java.base/java.lang.Thread.run(Thread.java:1583)
Additional Information
No response
Browsers Affected
No response
No Duplicate of the Issue
- [X] I have verified that there are no existing issues raised related to my problem.
issue is
Input PDF has a digital signature. OCR would alter the document,
invalidating the signature.
looks like a duplicate of issue resolved for PDFA conversion https://github.com/Stirling-Tools/Stirling-PDF/pull/1360
Will add same solution, thanks for raising
We no longer use ocrmypdf