[Bug]: Secondary OCR Language not showing up
The Problem
Hi dear StirlingPDF Team, lovely service, I use it for everything.
I'm just having a Problem with a secondary OCR Language not showing up. i have added the deu.trainingdata, and tesseract is working:
docker exec -it _redacted_ tesseract --list-langs [DS] Profile read from file (tesseract_opencl_profile_devices.dat). [DS] Device[1] 0:(null) score is 0.240261 [DS] Selected Device[1]: "(null)" (Native) List of available languages in "/data/tessdata/" (2): deu eng
I tried it out inside the container, and it did work correctly into .txt, including Umlauts.
But inside of stirlingPDF only english shows up.
I believe I followed every step in How to use OCR
my compose/stack config is below, everthing seems to be in order, and I did not find an env Variable I would have to change
I'm sure its a simple fix I'm overlooking.
I'm running in portainer as a stack,
App-Version: 0.27.0
Version of Stirling-PDF
0.27.0
Last Working Version of Stirling-PDF
0.27.0
Page Where the Problem Occurred
https://pdf.trauminselreisen.de
Docker Configuration
`version: '3.3'
services:
stirling-pdf:
image: frooodle/s-pdf:latest
ports:
- '82:8080' # Port-Mapping aus den Containerinformationen übernommen
volumes:
- stirling:/data
- /home/phil/tessdata:/data/tessdata
environment:
- DOCKER_ENABLE_SECURITY=false
- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=true
- LANGS=de_DE
- CUSTOM_FILES_DIR=/data/customFiles
- UI_APP_NAME=Trauminsel Reisen 🌴 PDF
- UI_HOME_DESCRIPTION=Alle PDF Tools von Trauminsel Reisen 🌴
- UI_APP_NAVBAR_NAME= PDF 🌴 Trauminsel Reisen
- TESSDATA_PREFIX=/data/tessdata
- CONFIGS_DIR=/data/configs
- JAVA_TOOL_OPTIONS=-XX:MaxRAMPercentage=75
- APP_LOCALE=de_DE
- SYSTEM_DEFAULT_LOCALE=de-DE
- BASE_URL=https://pdf.trauminselreisen.de
entrypoint:
- tini
- --
- /scripts/init.sh
command: ["java", "-Dfile.encoding=UTF-8", "-jar", "/app.jar"]
volumes:
stirling:
Relevant Log Output
08:30:24.649 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - xref 11: treating as an optimization candidate
08:30:25.388 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - XrefExt(xref=11, ext='.png')
08:30:25.388 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 1
08:30:25.389 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor -
08:30:25.389 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-XUNRAmg1o3nrUv2lQSIpmQ in page 0
08:30:25.389 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - xref 11: treating as an optimization candidate
08:30:25.390 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - xref 11: marking this JPEG as deflatable
08:30:25.396 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor -
08:30:25.397 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - Recursing into Form XObject /OCR-XUNRAmg1o3nrUv2lQSIpmQ in page 0
08:30:25.397 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - xref 11: treating as an optimization candidate
08:30:25.397 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - xref 11: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
08:30:25.398 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
08:30:25.398 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor -
08:30:25.402 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.i3_3_8x0/optimize.opt.pdf, /tmp/ocrmypdf.io.i3_3_8x0/optimize.pdf)
08:30:25.402 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
08:30:25.465 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
08:30:25.466 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._pipeline - Image optimization ratio: 3.45 savings: 71.0%
08:30:25.466 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - INFO ocrmypdf._pipeline - Total file size ratio: 3.25 savings: 69.2%
08:30:25.466 [Thread-9] INFO s.s.SPDF.utils.ProcessExecutor - DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.i3_3_8x0/optimize.pdf -> /tmp/output_3809299846436271993.pdf
Additional Information
No response
Browsers Affected
Firefox, Chrome, Other
No Duplicate of the Issue
- [X] I have verified that there are no existing issues raised related to my problem.
home/phil/tessdata:/data/tessdata
Your path is wrong, all our docs show it as /usr/share/tessdata
That was fast. Thanks.
I moved it over to /usr/share/tessdata, same. Manual works (docker exec -it 7a45c9990406 tesseract /data/Briefpapier.png output -l deu)
But it doesnt show up in the GUI
PS: I restarted the stack, of course
Can you share a screenshot of the /usr/share/tessdata directory inside docker container I want to know it's contents
Sure, thanks for looking into it
Edit: Sorry, you wrote inside the container:
Ok, so the answer is:
I had tessdata set to the wrong folder inside the container.
It has to be in /usr/share/tessdata inside the container, which I didn't get into my thick skull. After moving it there, everything works fine.
Thank you.