OSS-DocumentScanner [BUG] OCR for German language quite inaccurate

Which app is your issue for

Document Scanner

Version

1.14.5 Build 121

What platform are you using?

Android

OS Version

GrapheneOS latest

What happened?

OCR text extracted from the scanned documents is not of "good" quality. I used the OCR "best" version and as language "German". The text created is not as good as I am used from other Document Scanner Apps. There are unwanted spaces, misspelled words, special character, wrong words, etc. This makes it hard to search for text within PDF documents later on.

Maybe it is possible to add another OCR system or enhance the German version. Thank you very much.

Relevant log output

Code of Conduct

[x] I agree to follow this project's Code of Conduct

Apr 30 '25 20:04 drp4positive

@drp4positive i am sorry to hear that. It relies on tesseract for ocr recognition. This is the only (to.my knowledge) good enough OCR for all langaugaes But indeed it might be worst for some. Though it should be pretty much as good as for English or French. You can create an issue on their repo to see what they think of it.

May 02 '25 14:05 farfromrefug

Thanks @farfromrefug I used better light (daylight) to take photos and the OCR result is better, but not as good as I was used to from my old app. But I am happy to use a FOSS product and I will take your idea into consideration and let the people of tesseract know.

Greetings

May 02 '25 19:05 drp4positive

The same situation with OCR of texts in Russian

OSS Document Scanner 1.14.5.121 (2025-02-17):

ABBYY FineReader PDF 16.0.14.6564; part 1435.8:

PDF-XChange Editor 10.7.1, build 399 (Enhanced OCR ≡ ABBYY OCR 12):

I understand that Tesseract is pretty bad at recognizing non-English texts with imperfect letter outlines. But maybe it makes sense to think about using cloud services API based on neural networks? They currently recognize even handwritten texts very well, unlike traditional OCR engines that require many hours of training for each handwriting, and still recognize handwritten text poorly.

Sep 11 '25 12:09 Korb

@Korb @drp4positive i think this is because Cyrillic chars are not printed. You can try this build (github/fdroid build with sentry enabled) https://github.com/Akylas/OSS-DocumentScanner/releases/tag/webdav_test Report if it is better

Sep 12 '25 12:09 farfromrefug

OCR settings in both cases:

OSS Document Scanner 1.14.5.121 (2025-02-17):

OSS Document Scanner 1.14.5.121 (2025-09-12):

Clearly improved OCR quality!

Sep 12 '25 15:09 Korb

@Korb What different options have you used for the better result and for the not so good result? Thanks

Sep 12 '25 19:09 drp4positive

What different options have you used for the better result and for the not so good result?

OCR settings in both cases: Quality: Best Languages: Russian

Or do you mean desktop apps' settings?

Sep 13 '25 06:09 Korb

Improvements come from updated tesseract and cyrillic chars rendering

Sep 13 '25 06:09 farfromrefug

OSS-DocumentScanner OSS-DocumentScanner copied to clipboard

[BUG] OCR for German language quite inaccurate

Which app is your issue for

Version

What platform are you using?

OS Version

What happened?

Relevant log output

Code of Conduct

OSS-DocumentScanner
OSS-DocumentScanner copied to clipboard