Unidecode.NET icon indicating copy to clipboard operation
Unidecode.NET copied to clipboard

Does not work for characters with codepoints of >0xFFFF

Open TETYYS opened this issue 4 years ago • 0 comments

char.MaxValue in C# is 65535, so var high = c >> 8; in source code cannot exceed 255, while character file has values ranging from 468 to 497.

It is incorrect to loop through individual characters of input, EnumerateRunes() must be used instead.

Reproduction:

// p.csx
#r "nuget: Unidecode.NET, 2.1.0"

using Unidecode.NET;

var a = "更".Unidecode(); // f901 Kayng
var b = "🄁".Unidecode(); // 1f101 0,

Console.WriteLine(a);
Console.WriteLine(b);
dotnet script p.csx

Outputs:

Kayng

TETYYS avatar Dec 17 '21 21:12 TETYYS