browser
browser copied to clipboard
Implement charset detection from the first 1024 bytes of the HTML
charset
In browser.zig, in case of document HTML, we should try to determine the charset from a meta tag in the first 1024 bytes of the document.
The meta element can be used, and the charset attribute is preferred [html5:0]. If there is no HTTP declaration or BOM, a meta element must be used [html5:14]. Any meta declaration must use an ascii-compatible encoding [html5:14] [html5:16]. The implication of this is that UTF-16 encoded pages must not use a meta declaration. Any meta declaration must fit in the first 1024 bytes of page [html5:12] [html5:23]. https://www.w3.org/International/articles/spec-summaries/encoding https://www.w3.org/International/questions/qa-html-encoding-declarations
If we found no charset, we should mime.charset and finally utf-8 by default.