Henri Sivonen
Henri Sivonen
> Their rationale cites ISO-2022-JP, which seems like a red herring. The paragraphs referring to VDCs are directly relevant to this issue.
We have telemetry from Firefox 86 that is best explained by the hypothesis that users in Taiwan and Hong Kong encounter unlabeled Big5 containing byte sequences that the Encoding Standard...
AR PL UMing HK on Ubuntu 20.04 provides glyphs for U+EEFF and U+F7EF through U+F816, inclusive.
> Blink began to use Compact Encoding Detector ( google/compact_enc_det ) when no encoding label is found (http, meta). Whoa! Do you mean Blink now uses more unspecified heuristics than...
Also, does existing Web content actually require ISO-2022-JP to be heuristically detected (as opposed to heuristically detecting between Shift_JIS and EUC-JP)? Wouldn't in be a security improvement not to detect...
Also, letting a detector decide that input should be decoded as the replacement encoding is most likely a bad idea. Consider an unlabeled site whose markup template and site-provided content...
@jungshik, I have some more questions: - Did you explore guessing the encoding from the top-level domain name instead of guessing it from content? - How many bytes do you...
> Realized these questions better be answered by the person who ported the new detector to Blink Thank you for the answer. > TLD is utilized in the way that...
> What are 'new' TLDs? I mean TLDs that are neither the ancient ones (.com, .org, .net, .edu, .gov, .mil) nor country TLDs. That is, TLDs like .mobi, .horse, .xyz,...
> It feeds the first chunk of the document received from network FWIW, this is *so* dependent on IO implementation details that the sniffing limit is different for file: URLs...