normcap icon indicating copy to clipboard operation
normcap copied to clipboard

[Bug] Crash when trying to add 'fraktur' languages

Open RedSnt opened this issue 5 months ago • 2 comments

What happened?

I'm very happy with normcap, it's an excellent tool for simple OCR jobs (like for memes which I can then easier translate), but I wanted to add both Danish and German and each of these languages have a "fraktur" version available when adding languages.
But when I try to download either of these I get an error.

I can see in my HOSTS filter that the 404 error is not because one of my filters are blocking the address at least, so not sure what it could be. Parsing error perhaps?

How did you install NormCap?

Flatpak (Flathub)

Operating System + Version?

Nobara 42 (Fedora 42 based)

[Linux only] Display Server (DS) + Desktop environment (DE)?

Wayland + KDE Plasma

Debug log output?*

14:30:37 - WARNING - normcap.gui.dbus:160 - Failed to move window via org.kde.kwin.Scripting!
14:30:37 - WARNING - normcap.gui.dbus:160 - Failed to move window via org.kde.kwin.Scripting!
14:30:49 - ERROR   - normcap.gui.downloader:57 - Could not download 'https://github.com/tesseract-ocr/tessdata_fast/raw/4.1.0/dan_frak.traineddata'
Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/normcap/gui/downloader.py", line 48, in run
    with urlopen(  # noqa: S310
         ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
14:30:49 - CRITICAL - normcap:148 - Uncaught exception!
Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/normcap/gui/language_manager.py", line 95, in _on_download_error
    QtWidgets.QMessageBox.critical(
TypeError: PySide6.QtWidgets.QMessageBox.critical(): not enough arguments. Note: keyword arguments are only supported for optional parameters.
14:30:49 - CRITICAL - normcap:151 - System info: {'normcap_version': '0.5.9', 'python_version': '3.11.11', 'cli_args': '/app/bin/normcap', 'is_briefcase_package': False, 'is_flatpak_package': True, 'is_appimage_package': False, 'platform': 'linux', 'desktop_environment': <DesktopEnvironment.KDE: 3>, 'display_manager_is_wayland': True, 'pyside6_version': '6.7.1', 'qt_version': '6.7.1', 'qt_library_path': '/usr/share/runtime/lib/plugins, /app/lib/python3.11/site-packages/PySide6/Qt/plugins, /usr/bin', 'locale': 'DEFAULT', 'config_directory': PosixPath('/home/redsnt/.var/app/com.github.dynobo.normcap/config/normcap'), 'resources_path': PosixPath('/app/lib/python3.11/site-packages/normcap/resources'), 'tesseract_path': PosixPath('/app/bin/tesseract'), 'tessdata_path': PosixPath('/home/redsnt/.var/app/com.github.dynobo.normcap/config/normcap/tessdata'), 'envs': {'TESSDATA_PREFIX': '/app/share', 'LD_LIBRARY_PATH': ''}, 'screens': [Screen(left=2560, top=180, right=4479, bottom=1259, device_pixel_ratio=1.0, index=0, screenshot=None), Screen(left=0, top=0, right=2559, bottom=1439, device_pixel_ratio=1.0, index=1, screenshot=None)]}
14:30:49 - CRITICAL - normcap:152 - Unfortunately, NormCap has to be terminated due to an unknown problem.
Please help improve NormCap by reporting this error, including the output above, on
https://github.com/dynobo/normcap/issues/new
Thanks!

RedSnt avatar Jun 03 '25 12:06 RedSnt

Thank for reporting this!

~~It seems like the url for Danish is not correct: NormCap is trying to load~~ ~~https://github.com/tesseract-ocr/tessdata_fast/raw/4.1.0/dan_frak.traineddata~~ ~~while the correct one seems to be~~ ~~https://github.com/tesseract-ocr/tessdata_fast/raw/4.1.0/dan.traineddata~~

~~Can you try to download e.g. German (DE) and report back if that works?~~

~~I'll fix the URL for the next NormCap version.~~

~~Until then, a workaround is to download the dan.traineddata-file manually in your browser and move it into ~/.config/normcap/tessdata/. Don't forget to restart NormCap afterwards.~~ ~~The language dan might still not be shown in NormCap's language manager, but it should become visible in the settings menu, where you can activate it.~~

Edit: Oh, wait, dan is already in available in the Language Manager, you are explicitly looking for dan_frak for the Fraktur font. Interestingly, it seems like this model is special: Unlike the other models, there seems to be no fast version for it (also no best version). But there is a default one.

Manual Workaround:

  1. Download model from https://github.com/tesseract-ocr/tessdata/raw/refs/tags/4.1.0/dan_frak.traineddata
  2. Move it into ~/.config/normcap/tessdata/
  3. Restart NormCap
  4. Activate dan_fra in the NormCap settings menu

I will fix that by using this model for dan_fra (instead of "fast") in next version of NormCap.

dynobo avatar Jun 03 '25 13:06 dynobo

Great to see you're on top of this already. In case you still needed the German fraktur variant, here is the error msg:

20:58:47 - ERROR   - normcap.gui.downloader:57 - Could not download 'https://github.com/tesseract-ocr/tessdata_fast/raw/4.1.0/deu_frak.traineddata'
Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/normcap/gui/downloader.py", line 48, in run
    with urlopen(  # noqa: S310
         ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
20:58:47 - CRITICAL - normcap:148 - Uncaught exception!
Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/normcap/gui/language_manager.py", line 95, in _on_download_error
    QtWidgets.QMessageBox.critical(
TypeError: PySide6.QtWidgets.QMessageBox.critical(): not enough arguments. Note: keyword arguments are only supported for optional parameters.
20:58:47 - CRITICAL - normcap:151 - System info: {'normcap_version': '0.5.9', 'python_version': '3.11.11', 'cli_args': '/app/bin/normcap', 'is_briefcase_package': False, 'is_flatpak_package': True, 'is_appimage_package': False, 'platform': 'linux', 'desktop_environment': <DesktopEnvironment.KDE: 3>, 'display_manager_is_wayland': True, 'pyside6_version': '6.7.1', 'qt_version': '6.7.1', 'qt_library_path': '/usr/share/runtime/lib/plugins, /app/lib/python3.11/site-packages/PySide6/Qt/plugins, /usr/bin', 'locale': 'DEFAULT', 'config_directory': PosixPath('/home/redsnt/.var/app/com.github.dynobo.normcap/config/normcap'), 'resources_path': PosixPath('/app/lib/python3.11/site-packages/normcap/resources'), 'tesseract_path': PosixPath('/app/bin/tesseract'), 'tessdata_path': PosixPath('/home/redsnt/.var/app/com.github.dynobo.normcap/config/normcap/tessdata'), 'envs': {'TESSDATA_PREFIX': '/app/share', 'LD_LIBRARY_PATH': ''}, 'screens': [Screen(left=2560, top=180, right=4479, bottom=1259, device_pixel_ratio=1.0, index=0, screenshot=None), Screen(left=0, top=0, right=2559, bottom=1439, device_pixel_ratio=1.0, index=1, screenshot=None)]}
20:58:47 - CRITICAL - normcap:152 - Unfortunately, NormCap has to be terminated due to an unknown problem.
Please help improve NormCap by reporting this error, including the output above, on
https://github.com/dynobo/normcap/issues/new
Thanks!

I probably should've mentioned I use the flatpak version, but I quickly found the right config and tessdata folder at ~/.var/app/com.github.dynobo.normcap/config/normcap/tessdata/.

RedSnt avatar Jun 04 '25 19:06 RedSnt

In #761, I added the fallback logic, that in case any language is not found (404) among the "fast" models, it will try to download the "normal" model or the "best" model.
Only if the model doesn't exists in any of the 3 variants (which should not be the case), then an error will be displayed.

dynobo avatar Aug 03 '25 15:08 dynobo