Notepad3 icon indicating copy to clipboard operation
Notepad3 copied to clipboard

Can't cope with codepage on some occasions

Open popyoung opened this issue 2 years ago • 5 comments

Version: Notepad3 (x64) v5.21.1129.1 You could reproduce this issue by creating a new txt whose hexview is "B1 BE CE C4 A1 A1 A1 A1 A1 A1 A1 A1". "B1 BE CE C4" denotes the Chinese word "本文", while "A1" is meaningless and displayed as a space. The correct codepage should be CP-936 (GBK). But it turns out to be EUC-JP, which makes "本文" displayed as a meaningless word "云猟", although "云猟" is not a Japanese word either.

popyoung avatar May 28 '22 14:05 popyoung

The CodePage-Detection is based on Machine-Learning trained on a set of Webpages (Wiki) for different ANSI Languages. So there might be cases, where the trainings-set was not sufficient. In your cases, it would be easier to switch off Codepage-Detection and use fixed File Tags ( ... encoding: CP-936 ) in the file header or footer, or UTF-8 throughout all your files (http://utf8everywhere.org/).

RaiKoHoff avatar Nov 18 '22 09:11 RaiKoHoff

Notepad++ detects encoding of the same files

gtumanyan avatar Jun 25 '23 09:06 gtumanyan

Examples please. 🤔 👀

  • With which files?
  • And with which version of Notepad3?

hpwamr avatar Jun 25 '23 11:06 hpwamr

  • https://www.upload.ee/download/15378037/0e56cd2f5d021d1ef597/nscopyd.bat
  • Notepad3 (x64) 6.23.624.1

gtumanyan avatar Jun 25 '23 15:06 gtumanyan

Hello @RaiKoHoff ,

See the result of DevDebubMode

2023-06-25_183611

I suggest to modify the AnalyzeReliableConfidenceLevel from 90 to 85? 🤔

2023-06-25_183917

hpwamr avatar Jun 25 '23 16:06 hpwamr