uBlock icon indicating copy to clipboard operation
uBlock copied to clipboard

HTML filtering not working in GBK's webpage.

Open ghcsd opened this issue 7 years ago • 7 comments

One or more specific URLs where the issue occurs

http://www.163.com/ http://www.qq.com/

Steps for anyone to reproduce the issue

  1. Add
www.qq.com##^body
www.163.com##^body

to My Filters.

  1. Browse to http://www.163.com/
  2. This page's body has not been removed.

Your settings

  • OS/version: win 10 x64
  • Browser/version: Firefox 57
  • uBlock Origin version: 1.14.23b13
Your filter lists

Default

ghcsd avatar Jan 05 '18 16:01 ghcsd

Given that GBK is listed at 0.2% (and that figure will only go down with time), I will decline, especially that you do not provide an actual case for which only HTML filtering can solve the issue.

gorhill avatar Jan 05 '18 16:01 gorhill

qq.com is one of the largest if not the largest news site in China, it's owned by Tencent. Although I don't have a specific issue, the user base of Tencent can't be underestimated.

The statistics you've shown seems to count the number of websites using the encoding and not the number of users affected by the encoding.

https://www.alexa.com/siteinfo/qq.com

image

jspenguin2017 avatar Jan 06 '18 03:01 jspenguin2017

It is added that GB2312 is a subset of GBK.

ghcsd avatar Jan 06 '18 05:01 ghcsd

https://en.wikipedia.org/wiki/GB_2312#Two_implementations_of_GB2312:

W3C's technical recommendation specifies a GBK encoding to be inferred for streams labelled gb2312, which in turn uses a GB18030 decoder.[

gorhill avatar Jan 06 '18 14:01 gorhill

In 2017 QQ users went down to 850M, surpassed only by WeChat with 963M users.

Ref: https://github.com/gorhill/uBlock/issues/3405#issuecomment-355719242

Atavic avatar Jan 06 '18 21:01 Atavic

I might have to ship 1.15.0 without encoding for Shift JIS and GBK available -- these are anything but trivial size-wise. It's a pity that these encoding algorithms are present in the browser, but unavailable to extensions.

gorhill avatar Jan 07 '18 21:01 gorhill

Looks like TextDecoder can still decode it, just TextEncoder can't encode it due to awesome specs. If you modify the header and trascode the stream, does it still break? Or you are worried about performance?

jspenguin2017 avatar Jan 07 '18 21:01 jspenguin2017