svelte-jsoneditor icon indicating copy to clipboard operation
svelte-jsoneditor copied to clipboard

Unicode and invisible characters

Open AlexRMU opened this issue 11 months ago • 2 comments

Samples

When working with unicode (that is, almost any text), you need to remember many things. Here are a few of them:

  • Invisible characters Invisible characters can behave differently on different devices, browsers, and fonts. They are usually invisible, but they still take up space.

    "឴" != "";
    "_឴_" != "__";
    

    That's how they are highlighted in the VS Code: image

  • Combining character and cursed strings The display of the combining character depends on many factors. They can often display strangely and break the interface and styles.

    This is how they are currently displayed in the editor: image

    That's how they are displayed in the VS Code: image

  • Surrogate couples and normalization https://en.wikipedia.org/wiki/Unicode_equivalence https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

    "_._".normalize(); // "_._"
    "_._".normalize("NFC"); // "_._"
    "_._".normalize("NFD"); // "_._"
    "_._".normalize("NFKC"); // "_._"
    "_._".normalize("NFKD"); // "_._"
    
    const name1 = "\u0041\u006d\u00e9\u006c\u0069\u0065";
    const name2 = "\u0041\u006d\u0065\u0301\u006c\u0069\u0065";
    name1 != name2; // "Amélie" != "Amélie"
    name1.length != name2.length
    
    const name1NFC = name1.normalize("NFC");
    const name2NFC = name2.normalize("NFC");
    name1NFC == name2NFC; // "Amélie" == "Amélie"
    name1NFC.length == name2NFC.length
    

    Before and after formatting: image


Everything seems to be fine with this in the editor now. I suggest:

  • highlight invisible characters
  • automatically normalize and decode all strings when pasting or formatting

AlexRMU avatar Feb 29 '24 07:02 AlexRMU