lute-v3 icon indicating copy to clipboard operation
lute-v3 copied to clipboard

Japanese Language Load Fails with UnicodeDecodeError (shift_jis)

Open lixiao888 opened this issue 9 months ago • 1 comments

Description

The Lute application crashes with a UnicodeDecodeError when attempting to load predefined Japanese language settings. The error occurs within the natto library, which is used for MeCab integration, specifically when natto.MeCab attempts to decode byte sequences from MeCab's output using the 'shift_jis' codec. The traceback indicates that byte 0x96 at position 0 is an invalid start of a multibyte sequence for this encoding, suggesting a mismatch between MeCab's actual output encoding (most likely UTF-8) and natto's default or configured decoding encoding (Shift_JIS).

ERROR:lute.app_factory:Exception on /language/load_predefined/Japanese [GET]
Traceback (most recent call last):
  File "E:\Python-WorkSpace\my_lute\.venv\Lib\site-packages\natto\mecab.py", line 401, in __parse_tonodes
    surf = self.__bytes2str(raws).strip(self._STRIP_WHITESPACE)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Python-WorkSpace\my_lute\.venv\Lib\site-packages\natto\support.py", line 17, in bytes2str
    return b.decode(enc)
           ^^^^^^^^^^^^^
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x96 in position 0: incomplete multibyte sequence
decoding with 'shift-jis' codec failed

To Reproduce

  1. Start the Lute application (e.g., python -m lute).
  2. Access the Lute web interface in a browser.
  3. Navigate to the "Settings" or "Languages" section of the application.
  4. Attempt to load or apply the "Japanese" predefined language settings. This usually involves clicking a button or link that triggers the /language/load_predefined/Japanese GET endpoint.
  5. Observe the application crash or the UnicodeDecodeError appearing in the server's console/logs.

Screenshots

[If you have any screenshots showing the UI before the crash or the error message within the browser developer console (if visible), please add them here. For a server-side error like this, console logs are often more informative.]

Extra software info, if not already included in the Description:

  • OS: Windows (inferred from E:\Python-WorkSpace in traceback)
  • Browser: Any modern browser (e.g., Chrome, Firefox, Edge). The issue is server-side, triggered by a client request.
  • How you've installed Lute: Python (likely via pip into a virtual environment, inferred from my_lute\.venv\Lib\site-packages path)
  • Version: [Please provide the Lute version you are using, if known. You can often find this in pyproject.toml, setup.py, or within the application's "About" page.]

lixiao888 avatar May 31 '25 14:05 lixiao888

Hi @lixiao888 -- apologies for the very late reply.

Hm, this is really tough. I don't use or have Windows and so can't look into this.

I have run into problems with mecab/natto when the python architecture (and maybe version?) of mecab/natto didn't match the architecture of my mac. I was out of my depth when investigating this ... iirc it was something like my mecab or natto was using x86, but my computer was arm64 -- or something to that effect.

Other windows users have gotten things to work, and github CI does check windows and mecab, so it can be made to work. Unfortunately, I don't know how to check your specific situation.

jzohrab avatar Jul 18 '25 01:07 jzohrab