`cssSelector` doesn't handle combining characters correctly
@Test
void combiningCharactersInIdentifier()
{
final String html = """
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<img class="e\u0301" src="/corner.jpg">
</body>
</html>""";
final Document document = Jsoup.parse(html);
final Elements images = document.getElementsByTag("img");
final Element img = images.get(0);
final String cssSelector = img.cssSelector();
assertEquals("html > body > img.e\u0301", cssSelector);
}
The example above uses combining characters to create an รฉ. Emoji make heavy use of combining characters (๐จโ๐จโ๐งโ๐ง is made up of 11 characters: \uD83D\uDC68\u200D\uD83D\uDC68\u200D\uD83D\uDC67\u200D\uD83D\uDC67).
I have seen emoji used as css class names in the wild, and I think the character escaping code is doing the wrong thing when calling cssSelector, it looks like it's escaping every character individually, which breaks things with these combining characters.
Current jsoup: html > body > img.e\ฬ
Chrome: body > p.e\\u0301
I don't think it's incorrect to emit it as a run of characters. And the selector does work in jsoup. We could improve to escape the combining form as a \u escape character, like Chrome is.