PHP-CSS-Parser icon indicating copy to clipboard operation
PHP-CSS-Parser copied to clipboard

Add a heuristic for determining the charset

Open oliverklee opened this issue 1 year ago • 3 comments

From https://github.com/MyIntervals/PHP-CSS-Parser/pull/688#issuecomment-2330767391:

In essence: have some heuristic to determine the input encoding (BOM, @charset, try a few common charsets and pick the first one that doesn’t produce errors), then convert to UTF-8 and, from that point on, all the tokens of interest to us will be ASCII-only and can be parsed using regular string functions.

oliverklee avatar Sep 05 '24 07:09 oliverklee

We can follow what browsers do: https://developer.mozilla.org/en-US/docs/Web/CSS/@charset

oliverklee avatar Sep 05 '24 07:09 oliverklee

We can follow what browsers do: https://developer.mozilla.org/en-US/docs/Web/CSS/@charset

Yes good idea. Though browsers have a Content-Type header that may include a charset= specifier that we don’t have (as well as the resolved charset of the referring document). But we can definitely follow what browsers do absent charset=.

sabberworm avatar Sep 05 '24 08:09 sabberworm

Though browsers have a Content-Type header that may include a charset= specifier that we don’t have (as well as the resolved charset of the referring document).

We can use the value provided to Settings::withDefaultCharset in its place.

JakeQZ avatar Sep 05 '24 15:09 JakeQZ