Jinsuk Kim

Results 8 comments of Jinsuk Kim

Realized these questions better be answered by the person who ported the new detector to Blink (which is me): > Did you explore guessing the encoding from the top-level domain...

Please note that I'm not working on Blink. And I ported CED as a drop-in replacement of ICU encoding detector for any Blink-based Web browser to benefit from it. I'm...

I missed this: > Have you estimated the success rate of looking at the TLD only versus looking at the first network buffer and falling back to looking at the...

Expecting inputs from @hsivonen @annevk, etc.

I'm planning to get some stats to understand the impact of switching the size to 1024 (i.e. documents that give different encoding guesses to the input size. 1024 vs. 2048,...

Ran the detector over substantial amount of unlabelled documents and got following stat: ``` input size coverage (%) 1K 84.36 2K 92.86 3K 96.28 4K 98.60 ``` (The rest 1.40%...

> Earlier in the sentence you said you wouldn't feed things into the parser so this wouldn't apply, right? That's correct - there won't be reparsing. Feeding will be delayed...

This will give you an idea how CED works in web standard-compliant way: https://github.com/google/compact_enc_det/blob/master/compact_enc_det/compact_enc_det.cc#L5649 > It's still unclear to me what motivated the CED detector as other browsers don't have...