[Bug]: Hebrew not supported with LibreOffice(?)
Installation Method
Docker
The Problem
Trying to use the Convert any file to PDF and seems like any file (docx / txt / etc...) written in Hebrew for some reason result PDF with bad text (the text looks like empty squares):
you can check the result PDF file: hebrew_text_convertedToPDF.pdf
you can test with this simple text file: hebrew_text.txt
Version of Stirling-PDF
0.36.2
Last Working Version of Stirling-PDF
No response
Page Where the Problem Occurred
http://localhost:8080/file-to-pdf
Docker Configuration
version: '3.3'
services:
stirling-pdf:
image: stirlingtools/stirling-pdf:latest
ports:
- '8080:8080'
volumes:
- ./trainingData:/usr/share/tessdata
environment:
- DOCKER_ENABLE_SECURITY=false
- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
- LANGS=en_GB
Relevant Log Output
No response
Additional Information
i assume this related to this issue - #420
Browsers Affected
No response
No Duplicate of the Issue
- [X] I have verified that there are no existing issues raised related to my problem.
Hey there, @MatanMaimon! I'm Dosu, a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's work on resolving the issue you're facing while we wait for a human maintainer.
A similar closed issue was found regarding font-related problems with non-Latin characters, specifically Cyrillic characters. The solution suggested in that issue was to address missing fonts by specifying the language code in the configuration file. You can refer to the example provided in the Docker configuration file here: docker-compose-latest.yml. This should help in downloading the correct fonts needed for Hebrew text [1].
In your Docker configuration, you can try adding the Hebrew language code to the LANGS environment variable. For example, you might modify it to include he_IL or another appropriate Hebrew language code. This change should help ensure that the necessary fonts for Hebrew are installed and used during the conversion process.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
Hey there, @MatanMaimon! I'm Dosu, a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's work on resolving the issue you're facing while we wait for a human maintainer.
A similar closed issue was found regarding font-related problems with non-Latin characters, specifically Cyrillic characters. The solution suggested in that issue was to address missing fonts by specifying the language code in the configuration file. You can refer to the example provided in the Docker configuration file here: docker-compose-latest.yml. This should help in downloading the correct fonts needed for Hebrew text [1].
In your Docker configuration, you can try adding the Hebrew language code to the
LANGSenvironment variable. For example, you might modify it to includehe_ILor another appropriate Hebrew language code. This change should help ensure that the necessary fonts for Hebrew are installed and used during the conversion process.To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
@dosu i have tried this code and it didn't help (of course i tried to rerun the container, remove and run etec...)
version: '3.3'
services:
stirling-pdf:
image: stirlingtools/stirling-pdf:latest
ports:
- '8080:8080'
volumes:
- ./trainingData:/usr/share/tessdata
environment:
DOCKER_ENABLE_SECURITY: false
INSTALL_BOOK_AND_ADVANCED_HTML_OPS: false
LANGS: "en_GB,en_US,he_IL"
SYSTEM_DEFAULTLOCALE: he_IL
can you try LANGS as "ALL"
If adding the Hebrew language code to the LANGS environment variable didn't resolve the issue, you might want to check if the necessary fonts for Hebrew are installed in the Docker container. Sometimes, the issue with non-Latin scripts like Hebrew can be due to missing fonts.
Here are a few steps you can try:
-
Install Hebrew Fonts: Ensure that Hebrew fonts are installed in the Docker container. You can do this by modifying the Dockerfile to include the installation of Hebrew fonts. For example, you can add a line to install
ttf-freefontor any other package that includes Hebrew fonts. -
Verify Font Configuration: Check if the font configuration in LibreOffice (used by Stirling-PDF for conversion) is set to use the installed Hebrew fonts. This might require accessing the LibreOffice configuration files within the container.
-
Check Locale Settings: Ensure that the locale settings in the container are correctly configured for Hebrew. You can do this by setting the
LANGandLC_ALLenvironment variables tohe_IL.UTF-8. -
Rebuild the Docker Image: After making these changes, rebuild the Docker image and restart the container to apply the new configurations.
These steps should help in resolving the issue with Hebrew text not displaying correctly in the converted PDF files.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
@Frooodle that worked! however, the result PDF is LTR.. can't it auto-detect the source file and set it to RTL when needed? Alternatively, can I explicitly set the result PDF's direction to RTL?