[3rdparty]: Paperless-ngx fails on consuming a file
Simple sanity checks
- [x] This is an issue with an app that uses OCRmyPDF for OCR
- [x] I am using a recent version of the third party app
- [x] I will include a file that reproduces the issuse
Third party app name and version
Paperless-ngx 2.14.7
Describe the bug
Paperless can't consume a file.
Steps to reproduce
1. Import attached file into Paperless-ngx.
2. OCR is automatically triggered.
3. The process is failed with the following errors in log.
Files
o451229v21_160992A98S_202401.pdf
OCRmyPDF version
No response
Relevant log output
[2025-03-17 23:01:37,509] [ERROR] [paperless.consumer] Error occurred while consuming document o451229v21_160992A98S_202401.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 288, in generate_pdfa
p = run_polling_stderr(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/subprocess/__init__.py", line 114, in run_polling_stderr
raise CalledProcessError(proc.returncode, args, output=None, stderr=stderr)
subprocess.CalledProcessError: Command '['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=RGB', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '/tmp/ocrmypdf.io.p49cqgey/pdfa.pdf', '-sstdout=%stderr', '/tmp/ocrmypdf.io.p49cqgey/pdfa.ps', '/tmp/ocrmypdf.io.p49cqgey/fix_docinfo.pdf']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 382, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 380, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
return _run_pipeline(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
pdf, messages = postprocess(pdf, context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 453, in postprocess
pdf_out = convert_to_pdfa(pdf_out, ps_stub_out, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 912, in convert_to_pdfa
context.plugin_manager.hook.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 139, in _multicall
raise exception.with_traceback(exception.__traceback__)
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103, in _multicall
res = hook_impl.function(*args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 131, in generate_pdfa
ghostscript.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 301, in generate_pdfa
raise SubprocessOutputError('Ghostscript PDF/A rendering failed') from e
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript PDF/A rendering failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 327, in main_wrap
raise exc_info[1]
File "/usr/src/paperless/src/documents/consumer.py", line 477, in run
document_parser.parse(self.working_copy, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 405, in parse
raise ParseError(
documents.parsers.ParseError: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
[2025-03-17 23:01:37,560] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: o451229v21_160992A98S_202401.pdf: Error occurred while consuming document o451229v21_160992A98S_202401.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 288, in generate_pdfa
p = run_polling_stderr(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/subprocess/__init__.py", line 114, in run_polling_stderr
raise CalledProcessError(proc.returncode, args, output=None, stderr=stderr)
subprocess.CalledProcessError: Command '['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=RGB', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '/tmp/ocrmypdf.io.p49cqgey/pdfa.pdf', '-sstdout=%stderr', '/tmp/ocrmypdf.io.p49cqgey/pdfa.ps', '/tmp/ocrmypdf.io.p49cqgey/fix_docinfo.pdf']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 382, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 380, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
return _run_pipeline(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in exec_concurrent
pdf, messages = postprocess(pdf, context, executor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 453, in postprocess
pdf_out = convert_to_pdfa(pdf_out, ps_stub_out, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 912, in convert_to_pdfa
context.plugin_manager.hook.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 139, in _multicall
raise exception.with_traceback(exception.__traceback__)
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103, in _multicall
res = hook_impl.function(*args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 131, in generate_pdfa
ghostscript.generate_pdfa(
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_exec/ghostscript.py", line 301, in generate_pdfa
raise SubprocessOutputError('Ghostscript PDF/A rendering failed') from e
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript PDF/A rendering failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 327, in main_wrap
raise exc_info[1]
File "/usr/src/paperless/src/documents/consumer.py", line 477, in run
document_parser.parse(self.working_copy, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 405, in parse
raise ParseError(
documents.parsers.ParseError: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/paperless/src/documents/tasks.py", line 154, in consume_file
msg = plugin.run()
^^^^^^^^^^^^
File "/usr/src/paperless/src/documents/consumer.py", line 509, in run
self._fail(
File "/usr/src/paperless/src/documents/consumer.py", line 151, in _fail
raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: o451229v21_160992A98S_202401.pdf: Error occurred while consuming document o451229v21_160992A98S_202401.pdf: SubprocessOutputError: Ghostscript PDF/A rendering failed. See logs for more information.
I'm getting the exactly same error with an invoice file I tried to upload to paperless-ngx.
@GooRoo Which OCR mode did you use (skip, redo, force)?
I used your file in my Paperless v2.14.7 instance in skip mode and the log was full of
[2025-04-04 12:54:33,868] [ERROR] [ocrmypdf.optimize] xref 7147: While extracting this image, an error occurred
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/optimize.py", line 334, in extract_images
result = extract_fn(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/optimize.py", line 224, in extract_image_generic
elif not pim.indexed and pim.colorspace in pim.SIMPLE_COLORSPACES:
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pikepdf/models/image.py", line 211, in colorspace
raise NotImplementedError(
NotImplementedError: not sure how to get colorspace: ['/Separation', '/Black', '/DeviceRGB', pikepdf.Dictionary({
"/C0": [ 1, 1, 1 ],
"/C1": [ Decimal('0.136691'), Decimal('0.121947'), Decimal('0.125305') ],
"/Domain": [ 0, 1 ],
"/FunctionType": 2,
"/N": 1,
"/Range": [ 0, 1, 0, 1, 0, 1 ]
})]
but the document was finally consumed and usable.
@kernie I haven't changed this setting, and its default value is skip I believe.
I am experiencing the same issue, with a document containing personal details: I can share it with the maintainer privately if needed.
EDIT: ah, no, I have a SIGSEGV with my document, not (simply) a non-zero return code.
subprocess.CalledProcessError: Command '['/nix/store/3nspm6rrs988yibwh6szhnfhrysgcydx-ghostscript-10.05.1/bin/gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=RGB', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '/tmp/ocrmypdf.io.94mjv4v2/pdfa.pdf', '-sstdout=%stderr', '/tmp/ocrmypdf.io.94mjv4v2/pdfa.ps', '/tmp/ocrmypdf.io.94mjv4v2/fix_docinfo.pdf']' died with <Signals.SIGSEGV: 11>.
@ambroisie Please share document. Brief instructions here. https://github.com/ocrmypdf/OCRmyPDF/wiki
It's most likely a Ghostscript issue.
@jbarlow83 2025.zip