css-inline icon indicating copy to clipboard operation
css-inline copied to clipboard

Don't convert HTML Entities to Unicode.

Open dkechag opened this issue 3 years ago • 0 comments

I've been using an old Perl css-inliner, which is quite good, but very, very slow (yes, I mean very slow even for pure Perl). I was happy to see this Rust-powered solution, and it is indeed over 100x faster, but has a "feature" that sort of breaks it for me and I don't get the reason. So, given input:

<html>
<head><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"></head>
<body>Here&rsquo;s Johnny</body>
</html>

I get:

<html><head><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"></head>
<body>Here’s Johnny

</body></html>

Which comes up as Here’s Johnny on the browser. Now, one could say why use the right single quote (although it is very common for typography reasons in place of apostrophe), but that was just an example, even things like &pound; get translated to 0xC2 0xA3, which looks quite bad unless your charset is UTF-8. Which is not great if you are trying to inline various things you did not create with that limitation in mind. I looked in the python wrapper code as I went through that, and I see it is not doing anything special apart from calling the Rust package, and looking into the Rust doc I don't see any control (or mention) of this behaviour, so the issue might be with the Servo components and not with css-inline technically, but I thought I'd ask in case I might be missing something.

dkechag avatar Mar 08 '22 19:03 dkechag