Cleanup Scans / OCR error
Docker Version 0.25.0 in Ubuntu 16.04TLS
Error log:
08:18:50.766 [qtp2053647669-44] INFO s.s.SPDF.utils.ProcessExecutor - Running command: ocrmypdf --verbose 2 --output-type pdf --pdf-renderer hocr --deskew --clean --skip-text --language eng /tmp/input_5947355544812649658.pdf /tmp/output_7993803185626034787.pdf
08:18:53.263 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf - ocrmypdf 16.1.1
08:18:53.263 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version']
08:18:53.976 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0
08:18:53.976 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
08:18:54.015 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
08:18:54.015 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
08:18:54.023 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
08:18:54.084 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Found gs 10.2.1
08:18:54.084 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
08:18:54.099 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
08:18:54.134 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
08:18:54.135 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - [DS] Device[1] 0:(null) score is 0.261972
08:18:54.139 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - [DS] Selected Device[1]: "(null)" (Native)
08:18:54.139 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - List of available languages in "/usr/share/tessdata/" (1):
08:18:54.139 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - eng
08:18:54.139 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor -
08:18:54.140 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.helpers - pikepdf mmap enabled
08:18:54.140 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - ERROR ocrmypdf._pipelines._common - ExitCodeException
08:18:54.140 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - Traceback (most recent call last):
08:18:54.140 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
08:18:54.140 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - return fn(options, plugin_manager)
08:18:54.140 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - ^^^^^^^^^^^^^^^^^^^^^^^^^^^
08:18:54.140 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 166, in _run_pipeline
08:18:54.141 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - check_requested_output_file(options)
08:18:54.141 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - File "/usr/lib/python3.12/site-packages/ocrmypdf/_validation.py", line 310, in check_requested_output_file
08:18:54.141 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - raise OutputFileAccessError(
08:18:54.141 [Thread-15] INFO s.s.SPDF.utils.ProcessExecutor - ocrmypdf.exceptions.OutputFileAccessError: Output file location (/tmp/output_7993803185626034787.pdf) is not a writable file.
08:18:54.243 [qtp2053647669-44] WARN o.e.j.ee10.servlet.ServletChannel - handleException /api/v1/misc/ocr-pdf java.io.IOException: Command process failed with exit code 5. Error message: DEBUG ocrmypdf - ocrmypdf 16.1.1
DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version']
DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Found gs 10.2.1
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.261972
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (1):
eng
DEBUG ocrmypdf.helpers - pikepdf mmap enabled
ERROR ocrmypdf._pipelines._common - ExitCodeException
Traceback (most recent call last):
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
return fn(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 166, in _run_pipeline
check_requested_output_file(options)
File "/usr/lib/python3.12/site-packages/ocrmypdf/_validation.py", line 310, in check_requested_output_file
raise OutputFileAccessError(
ocrmypdf.exceptions.OutputFileAccessError: Output file location (/tmp/output_7993803185626034787.pdf) is not a writable file.
Based on write errors i assume its a permission issue, are you running it as a certain user or GUID etc that might be causing issues?
or volume mapping the /tmp file causing some issue etc
This is my docker command:
docker run -d \
-p 9284:4443 \
-e DOCKER_ENABLE_SECURITY=false \
-e INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
-e LANGS=it_IT,en_GB \
-v /mnt/data/docker_data/stirling-pdf/trainingData:/usr/share/tessdata \
-v /mnt/data/docker_data/stirling-pdf/extraConfigs:/configs \
-v /mnt/data/docker_data/stirling-pdf/logs:/logs \
-v /etc/letsencrypt/live/<mydomain>/cert.pem:/configs/cert.pem \
-v /etc/letsencrypt/live/<mydomain>/privkey.pem:/configs/privkey.pem \
--name stirling-pdf \
--restart unless-stopped \
frooodle/s-pdf:latest
GUID and UID are the default. And this is the volume directory stucture:
stirling-pdf/:
total 12
drwxr-xr-x 2 1000 1000 4096 May 29 11:15 extraConfigs
drwxr-xr-x 2 1000 1000 4096 Jun 2 22:19 logs
drwxr-xr-x 4 1000 1000 4096 May 29 09:52 trainingData
stirling-pdf/extraConfigs:
total 8
-rwxr-xr-x 1 1000 1000 0 May 29 10:22 cert.pem
-rwxr-xr-x 1 1000 1000 155 May 29 10:35 custom_settings.yml
-rwxr-xr-x 1 1000 1000 0 May 29 10:22 privkey.pem
-rwxr-xr-x 1 1000 1000 3633 Jun 2 22:19 settings.yml
stirling-pdf/logs:
total 8
-rw-r--r-- 1 1000 1000 4103 Jun 2 22:19 info.log
-rwxr-xr-x 1 1000 1000 0 May 29 09:52 invalid-auths.log
stirling-pdf/trainingData:
total 22932
drwxr-xr-x 2 1000 1000 4096 May 29 09:52 configs
-rwxr-xr-x 1 1000 1000 23466654 May 29 09:52 eng.traineddata
-rwxr-xr-x 1 1000 1000 572 May 29 09:52 pdf.ttf
drwxr-xr-x 2 1000 1000 4096 May 29 09:52 tessconfigs
Looks like a bug with the "Correct pages were scanned at a skewed angle by rotating them back into place" feature if you have that enabled
Same error without "deskew" option.
Sorry for late reply, Does this still happen on latest version? closing ticket for now but happy to re-open if you can still reproduce