PoorMansTSqlFormatter icon indicating copy to clipboard operation
PoorMansTSqlFormatter copied to clipboard

A tilde (ã) and umlaut (ä) characters incorrectly treated in Notepad++ plugin

Open TaoK opened this issue 8 years ago • 6 comments

Felipe Gualberto points out that Portuguese a tilde-carrying characters are being mangled by the Poor Man's T-SQL Formatter plugin in Notepad++:

select * from Mão where Name = 'Felipe'

becomes

SELECT *
FROM M[xC3][xA3]o
WHERE NAME = 'Felipe'

Where the square brackets indicate Notepad++'s special "unrecognized binary sequence" white-on-black formatting.

Interestingly, other international characters seem to work fine, as in these Arabic and Chinese examples:

SELECT *
FROM العَرَبِيَّة
WHERE NAME = 'Felipe'
SELECT *
FROM 漢字
WHERE NAME = 'Felipe'

There is definitely nothing special about ã in any Poor Man's T-SQL Formatter code, so this suggests there may be an issue in Notepad++ or Scintilla causing this behavior...?

TaoK avatar Jun 06 '17 06:06 TaoK

Thanks, Tao. I confirm this in all pt-BR system I have with default settings and installation of the plugin.

FelipeCostaGualberto avatar Jun 06 '17 10:06 FelipeCostaGualberto

It looks like #217 has a possibly-more-complete description of probably-the-same-issue

TaoK avatar Jul 28 '19 14:07 TaoK

I'm removing the "duplicate" label, because I can reproduce this and I can't reproduce the issue reported in #217, but I do get something useful from that other issue: The issue only occurs if the document is not set to "Encode in ANSI". As far as I can tell, all other encodings produce the issue reported, but "ANSI" does not...

TaoK avatar Jul 28 '19 17:07 TaoK

The issue only occurs if the document is not set to "Encode in ANSI".

That was really helpful, thanks Tao!

FelipeCostaGualberto avatar Jul 29 '19 12:07 FelipeCostaGualberto

I've made some progress in understanding what's been happening here, although it seems like a major mess.

It looks like the C++ to .Net interop machinery in use here, when Scintilla ends up feeding a buffer into a .Net StringBuilder, doesn't put unicode characters into the resulting stringbuilder, but rather bytes.

Most of the time no-one notices, because the formatter only "reacts" to simple ANSI characters (same as byte or unicode UTF-8 sequence), treating all the rest rather simply/naively, and most importantly when these nonsense-strings are fed back into Scintilla, it interprets them as byte sequences, and everything "washes out".

This mess happens in formatSqlCommand() in PoorMansTSqlFormatterNppPlugin/Main.cs, and I'm working on it. Been distracted by some Visual Studio issues over the last couple of days, but coming back to it now.

(to be clear: I don't know whether it's scintilla misbehaving here, or the NPP .Net plugin bridge, or just something stupid that I personally am doing in this code.)

TaoK avatar Jul 31 '19 22:07 TaoK

I came to report that this was happening to me, but found this open issue. Adding my me too!

Thank you for all the effort put into this great tool!

mvbentes avatar Apr 15 '21 18:04 mvbentes