Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

[Bug]: OCR don't work on Digital Ocean

Open alexandrevannier opened this issue 1 year ago • 1 comments

Installation Method

None

The Problem

Hello, i deployed the app with the one click button, "deploy to digital ocean" (App Platform) it's awesome as it's very simple. But when the app is ready to use, if we go to Cleanups Scans / OCR. We don't have the possibility to upload a document. For other option it's working. Inside the container : tesseract 5.3 is installed.

image

On my local environnement with docker compose it's working like a charm !

Do you have any idea ? why OCR is not working out of the box on digital ocean ?

Version of Stirling-PDF

0.34.0

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

https://stirling-pdf-xjlfl.ondigitalocean.app/ocr-pdf

Docker Configuration

No response

Relevant Log Output

No response

Additional Information

No response

Browsers Affected

No response

No Duplicate of the Issue

  • [X] I have verified that there are no existing issues raised related to my problem.

alexandrevannier avatar Nov 28 '24 04:11 alexandrevannier

Hey @alexandrevannier! I'm here to help you with any bugs, questions, or contributions you have for the Stirling-PDF project. Let me know how I can assist you!

A similar issue was previously discussed, where the OCR functionality was not working on Digital Ocean for Stirling-PDF version 0.34.0. The solution involved ensuring that the paths for the tessdata locations are correctly set in the Docker setup. Specifically, the tessdata directory should be set to /usr/share/tessdata in your Docker configuration on Digital Ocean. This path change is crucial for the OCR to function properly [1][2].

Make sure your docker-compose.yml file on Digital Ocean includes the correct volume configuration:

services:
  your_service_name:
    image: your_docker_image_name
    volumes:
      - /location/of/trainingData:/usr/share/tessdata

Additionally, ensure that the necessary language packs are installed in the /usr/share/tessdata directory. You can download the desired .traineddata files from the Tesseract GitHub repositories, such as tessdata_fast or tessdata, and place them in the specified directory. Remember not to remove the existing eng.traineddata file, as it is required for the OCR to function.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Nov 28 '24 04:11 dosubot[bot]

Should work now!

Frooodle avatar Feb 10 '25 10:02 Frooodle