notepad-plus-plus [BUG] The encoding name ANSI is incorrect, should be Windows-1252

Is there an existing issue for this?

[X] I have searched the existing issues

Description of the Issue

The first entry under Encoding is called "ANSI". This encoding name is incorrect. Its correct name is "Windows-1252".

Steps To Reproduce

Click in the menu bar on Encoding, check first entry.

Current Behavior

The first entry in Encoding is called "ANSI" (also in the status bar). This name is incorrect.

Expected Behavior

The correct name for this encoding is "Windows-1252" (correct name under Encoding > Character Set > Western European). For details about the wrong name "ANSI" see Wikipedia: Windows-1252: Name.

Debug Information

Notepad++ v8.6.9   (64-bit)
Build time : Jul 12 2024 - 05:09:25
Path : C:\Program Files\Notepad++\notepad++.exe
Command Line : [...] 
Admin mode : OFF
Local Conf mode : OFF
Cloud Config : OFF
Periodic Backup : ON
OS Name : Windows 10 Pro for Workstations (64-bit)
OS Version : 22H2
OS Build : 19045.4651
Current ANSI codepage : 1252
Plugins : 
    mimeTools (3.1)
    NppConverter (4.6)
    NppExport (0.4)

Anything else?

No response

Jul 18 '24 17:07 beppo-dd

The first entry in Encoding is called "ANSI

I believe that entry is meant to indicate the current code page that your computer is using, without being specific about it. I suppose that, theoretically, if you change that code page in your system, Notepad++ wouldn't change anything about the selected menu entry or what is shown on the status bar, but it would interpret what it shows you for that file based on the new code page you had set.

Jul 23 '24 11:07 alankilborn

Cause: The problem arises when activating Windows 10+ experimental UTF-8 support in its control panel. While this is a welcome feature for programmers and Windows users, it adds trouble opening CP1252 files. Same problem applies to Programmer's Notepad.

Workaround: When activated, CP_ACP (1) maps to CP_UTF8 (65001), otherwise, for West-European Windows, it maps to 1252. Therefore, ANSI and UTF-8 have now same meaning, and the user has to select "encoding/more/western europe/CP1252" (quite deeply nested in hierarchy)

Proposal for bugfix (i.e. remove unexpected behaviour): Detect whether CP_ACP maps to CP_UTF8. If so, either:

remove menu point „encoding/ANSI“ and „encoding/convert to ANSI“
Make these menu entries gray, and add bubble help saying "On this system, ANSI is same as UTF-8"
Simulate ANSI by guessing „old“ code page for user's language, or OEM code page. In this case, change menu text to "ANSI = CP1252" to pin-point that ANSI is now somehow guessed.

Aug 26 '24 18:08 Haftmann

Moreover, auto-detect code page of input text files lead to nonsense when CP_ACP maps to CP_UTF8:

int utf16len, inputcp=CP_UTF8;
for(;;) {
 utf16len = MultiByteToWideChar(inputcp,MB_ERR_INVALID_CHARS,text,textsize,0,0);
 if (!GetLastError()) break; /*ERROR_NO_UNICODE_TRANSLATION*/
 if (inputcp==CP_ACP) break;
 inputcp=CP_ACP;
}
wchar_t*utext = new wchar_t[utf16len];
MultiByteToWideChar(inputcp,0,text,textsize,utext,utf16len);

In this case, an idea would be to break loop with inputcp==1252.

Aug 26 '24 18:08 Haftmann

Moreover, auto-detect code page of input text files lead to nonsense when CP_ACP maps to CP_UTF8

Excellent point; one that's been made before, sadly to no avail so far.

See, for example:

https://github.com/notepad-plus-plus/notepad-plus-plus/issues/14162#issuecomment-2043333548
https://github.com/notepad-plus-plus/notepad-plus-plus/issues/14681#issuecomment-1948459000

Aug 27 '24 00:08 rdipardo