browser icon indicating copy to clipboard operation
browser copied to clipboard

Implement charset detection from the first 1024 bytes of the HTML

Open krichprollsch opened this issue 7 months ago • 0 comments

charset

In browser.zig, in case of document HTML, we should try to determine the charset from a meta tag in the first 1024 bytes of the document.

The meta element can be used, and the charset attribute is preferred [html5:0]. If there is no HTTP declaration or BOM, a meta element must be used [html5:14]. Any meta declaration must use an ascii-compatible encoding [html5:14] [html5:16]. The implication of this is that UTF-16 encoded pages must not use a meta declaration. Any meta declaration must fit in the first 1024 bytes of page [html5:12] [html5:23]. https://www.w3.org/International/articles/spec-summaries/encoding https://www.w3.org/International/questions/qa-html-encoding-declarations

If we found no charset, we should mime.charset and finally utf-8 by default.

krichprollsch avatar Apr 14 '25 18:04 krichprollsch