In Notepad++, Code Alignment does not work on Unicode Glyphs
UNICODE GLYPH POINTS
… Elipsis.Alt +0133
⸘⸘ Upside-Down Interrobang.Alt +2E18
test test.test
test0 test0.test0
Aligning by space on the above causes the second line to align 2 short, and the third line to be 4 short. So each glyph is being treated as 3 characters long instead of one.
Worth noting: It seems like some fixed width fonts don't count glyphs as needing to conform to the width.
If possible this should be fixed.
I've a feeling this is actually a limitation of Notepad++ API or the c# wrapper I use to access it.
My logic is pretty simple, Line.cs' Text property makes the call... https://github.com/cpmcgrath/codealignment/blob/master/CodeAlignment.Npp/Implementations/Line.cs
public string Text
{
get
{
var start = (int)Win32.SendMessage(m_docPointer, SciMsg.SCI_POSITIONFROMLINE, m_lineNo, 0);
var end = (int)Win32.SendMessage(m_docPointer, SciMsg.SCI_GETLINEENDPOSITION, m_lineNo, 0);
var builder = new StringBuilder(end - start + 1);
Win32.SendMessage(m_docPointer, SciMsg.SCI_GETLINE, m_lineNo, builder);
return builder.ToString();
}
}
And the Imports are in NppPluginNETHelper.cs https://github.com/cpmcgrath/codealignment/blob/master/CodeAlignment.Npp/NppPluginNETHelper.cs
[DllImport("user32")]
public static extern IntPtr SendMessage(IntPtr hWnd, SciMsg Msg, int wParam, [MarshalAs(UnmanagedType.LPStr)] StringBuilder lParam);
The MarshalAs might be what's causing the problem.
Still to test, but I think it could be as simple as changing [MarshalAs(UnmanagedType.LPStr)] to [MarshalAs(UnmanagedType.LPWStr)]
No that makes it return jibberish
The rules to detect Unicode seem quite simple, but I'm doing something wrong. SCI_GET_LINE returns a stringbuilder where each character represents a byte. The rules for detecting Unicode is the first byte is between 0xC0 and 0xFD. Subsequent bytes will be between 0x80 and 0xBF
But for the above elispse (…) when I look at a file with it in binary viewer the codes are 0xE2 0x80 0xA6 but the codes passed to me are 0xE2 0xAC 0xA6 the fact that I can detect the correct number of bytes in the character should be enough to fix this, but I don't know if I'm comfortable releasing it like that.
c# has got Encoding.Unicode.GetString(byte[]) but it was just giving garbage to me.
Another example:
Hystérie Connective=3:09
Ghetto=2:41
Clé De Contact=2:50
after aligning by equals it becomes
Hystérie Connective =3:09
Ghetto =2:41
Clé De Contact =2:50
I guess you haven't done Unicode normalization. Just normalize to NFC and some issues will be fixed