Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

[Bug]: Hebrew not supported with LibreOffice(?)

Open MatanMaimon opened this issue 1 year ago • 5 comments

Installation Method

Docker

The Problem

Trying to use the Convert any file to PDF and seems like any file (docx / txt / etc...) written in Hebrew for some reason result PDF with bad text (the text looks like empty squares): image

you can check the result PDF file: hebrew_text_convertedToPDF.pdf

you can test with this simple text file: hebrew_text.txt

Version of Stirling-PDF

0.36.2

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

http://localhost:8080/file-to-pdf

Docker Configuration

version: '3.3'
services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:latest
    ports:
      - '8080:8080'
    volumes:
      - ./trainingData:/usr/share/tessdata
    environment:
      - DOCKER_ENABLE_SECURITY=false
      - INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
      - LANGS=en_GB

Relevant Log Output

No response

Additional Information

i assume this related to this issue - #420

Browsers Affected

No response

No Duplicate of the Issue

  • [X] I have verified that there are no existing issues raised related to my problem.

MatanMaimon avatar Dec 18 '24 10:12 MatanMaimon

Hey there, @MatanMaimon! I'm Dosu, a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's work on resolving the issue you're facing while we wait for a human maintainer.

A similar closed issue was found regarding font-related problems with non-Latin characters, specifically Cyrillic characters. The solution suggested in that issue was to address missing fonts by specifying the language code in the configuration file. You can refer to the example provided in the Docker configuration file here: docker-compose-latest.yml. This should help in downloading the correct fonts needed for Hebrew text [1].

In your Docker configuration, you can try adding the Hebrew language code to the LANGS environment variable. For example, you might modify it to include he_IL or another appropriate Hebrew language code. This change should help ensure that the necessary fonts for Hebrew are installed and used during the conversion process.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

dosubot[bot] avatar Dec 18 '24 10:12 dosubot[bot]

Hey there, @MatanMaimon! I'm Dosu, a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's work on resolving the issue you're facing while we wait for a human maintainer.

A similar closed issue was found regarding font-related problems with non-Latin characters, specifically Cyrillic characters. The solution suggested in that issue was to address missing fonts by specifying the language code in the configuration file. You can refer to the example provided in the Docker configuration file here: docker-compose-latest.yml. This should help in downloading the correct fonts needed for Hebrew text [1].

In your Docker configuration, you can try adding the Hebrew language code to the LANGS environment variable. For example, you might modify it to include he_IL or another appropriate Hebrew language code. This change should help ensure that the necessary fonts for Hebrew are installed and used during the conversion process.

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosu i have tried this code and it didn't help (of course i tried to rerun the container, remove and run etec...)

version: '3.3'
services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:latest
    ports:
      - '8080:8080'
    volumes:
      - ./trainingData:/usr/share/tessdata
    environment:
      DOCKER_ENABLE_SECURITY: false
      INSTALL_BOOK_AND_ADVANCED_HTML_OPS: false
      LANGS: "en_GB,en_US,he_IL"
      SYSTEM_DEFAULTLOCALE: he_IL

MatanMaimon avatar Dec 18 '24 11:12 MatanMaimon

can you try LANGS as "ALL"

Frooodle avatar Dec 18 '24 11:12 Frooodle

If adding the Hebrew language code to the LANGS environment variable didn't resolve the issue, you might want to check if the necessary fonts for Hebrew are installed in the Docker container. Sometimes, the issue with non-Latin scripts like Hebrew can be due to missing fonts.

Here are a few steps you can try:

  1. Install Hebrew Fonts: Ensure that Hebrew fonts are installed in the Docker container. You can do this by modifying the Dockerfile to include the installation of Hebrew fonts. For example, you can add a line to install ttf-freefont or any other package that includes Hebrew fonts.

  2. Verify Font Configuration: Check if the font configuration in LibreOffice (used by Stirling-PDF for conversion) is set to use the installed Hebrew fonts. This might require accessing the LibreOffice configuration files within the container.

  3. Check Locale Settings: Ensure that the locale settings in the container are correctly configured for Hebrew. You can do this by setting the LANG and LC_ALL environment variables to he_IL.UTF-8.

  4. Rebuild the Docker Image: After making these changes, rebuild the Docker image and restart the container to apply the new configurations.

These steps should help in resolving the issue with Hebrew text not displaying correctly in the converted PDF files.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

dosubot[bot] avatar Dec 18 '24 11:12 dosubot[bot]

@Frooodle that worked! however, the result PDF is LTR.. can't it auto-detect the source file and set it to RTL when needed? Alternatively, can I explicitly set the result PDF's direction to RTL?

MatanMaimon avatar Dec 18 '24 11:12 MatanMaimon