PythonScript editor.getWordChars() doesn't work in PythonScript 3.0.14

Using Notepad++ 8.4.4 x64 and PythonScript_Full_3.0.14.0_x64.zip:

editor.getWordChars() triggers this error message: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

In PythonScript 2.0.0 (using the same Notepad++ binary), the method returns a string containing the bytes/chars 0xFF-0x80, followed by z-a, _, Z-A, and 9-0 (in that order). That doesn't look like an UTF-8 string...

Aug 05 '22 10:08 cmagnush

See https://www.scintilla.org/ScintillaDoc.html#SCI_GETWORDCHARS

For multi-byte encodings, this API will not return meaningful values for 0x80 and above.

, so it needs to be checked if change in the encoding could fix that problem.

Current python 2 output:

>>> editor.getWordChars()
'\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0\xef\xee\xed\xec\xeb\xea\xe9\xe8\xe7\xe6\xe5\xe4\xe3\xe2\xe1\xe0\xdf\xde\xdd\xdc\xdb\xda\xd9\xd8\xd7\xd6\xd5\xd4\xd3\xd2\xd1\xd0\xcf\xce\xcd\xcc\xcb\xca\xc9\xc8\xc7\xc6\xc5\xc4\xc3\xc2\xc1\xc0\xbf\xbe\xbd\xbc\xbb\xba\xb9\xb8\xb7\xb6\xb5\xb4\xb3\xb2\xb1\xb0\xaf\xae\xad\xac\xab\xaa\xa9\xa8\xa7\xa6\xa5\xa4\xa3\xa2\xa1\xa0\x9f\x9e\x9d\x9c\x9b\x9a\x99\x98\x97\x96\x95\x94\x93\x92\x91\x90\x8f\x8e\x8d\x8c\x8b\x8a\x89\x88\x87\x86\x85\x84\x83\x82\x81\x80zyxwvutsrqponmlkjihgfedcba_ZYXWVUTSRQPONMLKJIHGFEDCBA9876543210'

Nov 05 '22 12:11 chcg

With python 3 after adding an iso_latin_1_to_utf8 conversion I get:

>>> editor.getWordChars()
'ÿþýüûúùø÷öõôóòñðïîíìëêéèçæåäãâáàßÞÝÜÛÚÙØ×ÖÕÔÓÒÑÐÏÎÍÌËÊÉÈÇÆÅÄÃÂÁÀ¿¾½¼»º¹¸·¶µ´³²±°¯®\xad¬«ª©¨§¦¥¤£¢¡\xa0\x9f\x9e\x9d\x9c\x9b\x9a\x99\x98\x97\x96\x95\x94\x93\x92\x91\x90\x8f\x8e\x8d\x8c\x8b\x8a\x89\x88\x87\x86\x85\x84\x83\x82\x81\x80zyxwvutsrqponmlkjihgfedcba_ZYXWVUTSRQPONMLKJIHGFEDCBA9876543210'

Nov 12 '22 23:11 chcg