Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

[Bug]: Cleanup Scans / OCR error (ref #1329)

Open marcofenoglio opened this issue 1 year ago • 5 comments

Installation Method

Docker

The Problem

In reference to #1329

I tried with the new version 0.29.0 and I still get the same error: Output file location (/tmp/output_16848536940110331072.pdf) is not a writable file.

Version of Stirling-PDF

0.29.0

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

httls://mydomain/pdf/ocr-pdf

Docker Configuration

docker run -d \
  --name stirling-pdf \
  -p 9284:8080 \
  -e SYSTEM_ROOTURIPATH=/pdf \
  -e DOCKER_ENABLE_SECURITY=false \
  -e INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
  -e LANGS=it_IT,en_GB \
  -v /mnt/data/docker_data/stirling-pdf/trainingData:/usr/share/tessdata \
  -v /mnt/data/docker_data/stirling-pdf/extraConfigs:/configs \
  -v /mnt/data/docker_data/stirling-pdf/logs:/logs \
  --restart unless-stopped \
  frooodle/s-pdf:0.29.0

Relevant Log Output

12:06:39.569 [qtp2050525584-35] INFO  s.s.SPDF.utils.ProcessExecutor - Running command: ocrmypdf --verbose 2 --output-type pdf --pdf-renderer hocr --deskew --skip-text --language eng /tmp/input_12592559430707309562.pdf /tmp/output_16848536940110331072.pdf
12:06:40.226 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf - ocrmypdf 16.1.1
12:06:40.226 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
12:06:40.234 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
12:06:40.235 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
12:06:40.243 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
12:06:40.258 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess - Found gs 10.3.1
12:06:40.258 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
12:06:40.274 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
12:06:40.286 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
12:06:40.287 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor - [DS] Device[1] 0:(null) score is 0.390008
12:06:40.287 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor - [DS] Selected Device[1]: "(null)" (Native)
12:06:40.288 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor - List of available languages in "/usr/share/tessdata/" (1):
12:06:40.288 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor - eng
12:06:40.289 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -
12:06:40.289 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   DEBUG ocrmypdf.helpers - pikepdf mmap enabled
12:06:40.291 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   ERROR ocrmypdf._pipelines._common - ExitCodeException
12:06:40.292 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor - Traceback (most recent call last):
12:06:40.292 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
12:06:40.293 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -     return fn(options, plugin_manager)
12:06:40.294 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:06:40.294 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 166, in _run_pipeline
12:06:40.295 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -     check_requested_output_file(options)
12:06:40.295 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -   File "/usr/lib/python3.12/site-packages/ocrmypdf/_validation.py", line 310, in check_requested_output_file
12:06:40.295 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor -     raise OutputFileAccessError(
12:06:40.296 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor - ocrmypdf.exceptions.OutputFileAccessError: Output file location (/tmp/output_16848536940110331072.pdf) is not a writable file.
12:06:40.419 [qtp2050525584-35] WARN  o.e.j.ee10.servlet.ServletChannel - handleException /pdf/api/v1/misc/ocr-pdf java.io.IOException: Command process failed with exit code 5. Error message:   DEBUG ocrmypdf - ocrmypdf 16.1.1
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Found gs 10.3.1
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
  DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.390008
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (1):
eng

  DEBUG ocrmypdf.helpers - pikepdf mmap enabled
  ERROR ocrmypdf._pipelines._common - ExitCodeException
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
    return fn(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 166, in _run_pipeline
    check_requested_output_file(options)
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_validation.py", line 310, in check_requested_output_file
    raise OutputFileAccessError(
ocrmypdf.exceptions.OutputFileAccessError: Output file location (/tmp/output_16848536940110331072.pdf) is not a writable file.

Additional Information

No response

Browsers Affected

No response

No Duplicate of the Issue

  • [X] I have verified that there are no existing issues raised related to my problem.

marcofenoglio avatar Sep 17 '24 12:09 marcofenoglio

Your ticket says "Installation Method

None"

Without this we can't help or debug

Frooodle avatar Sep 17 '24 12:09 Frooodle

I updated the ticket! I use Docker

marcofenoglio avatar Sep 17 '24 13:09 marcofenoglio

Running the contianer without the default seccomp profile, using --security-opt seccomp=unconfined, I have no more errors. How can I inspect which is the syscall causing the problem?

marcofenoglio avatar Oct 01 '24 07:10 marcofenoglio

在没有默认 seccomp 配置文件的情况下运行容器,使用 --security-opt seccomp=unconfined,我没有更多的错误。如何检查哪个系统调用导致了问题?

How did you solve the OCR error code 5? I also have this problem

abi486153 avatar Oct 08 '24 03:10 abi486153

o.e.j.ee10.servlet.ServletChannel

Using --security-opt seccomp=unconfined runnig the continer I have no more error.

marcofenoglio avatar Oct 08 '24 10:10 marcofenoglio