Notepad3 Can't cope with codepage on some occasions

Can't cope with codepage on some occasions

Open popyoung opened this issue 2 years ago • 5 comments

Version: Notepad3 (x64) v5.21.1129.1 You could reproduce this issue by creating a new txt whose hexview is "B1 BE CE C4 A1 A1 A1 A1 A1 A1 A1 A1". "B1 BE CE C4" denotes the Chinese word "本文", while "A1" is meaningless and displayed as a space. The correct codepage should be CP-936 (GBK). But it turns out to be EUC-JP, which makes "本文" displayed as a meaningless word "云猟", although "云猟" is not a Japanese word either.

May 28 '22 14:05 popyoung

The CodePage-Detection is based on Machine-Learning trained on a set of Webpages (Wiki) for different ANSI Languages. So there might be cases, where the trainings-set was not sufficient. In your cases, it would be easier to switch off Codepage-Detection and use fixed File Tags ( ... encoding: CP-936 ) in the file header or footer, or UTF-8 throughout all your files (http://utf8everywhere.org/).