adblockparser ||domain.com should match wss:subdomain.domain.com (but it doesn't)

||domain.com should match wss:subdomain.domain.com (but it doesn't)

Open MadDataScience opened this issue 4 years ago • 1 comments

The regex at https://github.com/scrapinghub/adblockparser/blob/4089612d65018d38dbb88dd7f697bcb07814014d/adblockparser/parser.py#L264 appears to be too restrictive. According to https://help.eyeo.com/en/adblockplus/how-to-write-filters#anchors

You might want to block http://example.com/banner.gif as well as https://example.com/banner.gif and http://www.example.com/banner.gif. You can do this by putting two pipe symbols in front of the filter. This ensures that the filter matches at the beginning of the domain name: ||example.com/banner.gif, and blocks all of these addresses while not blocking http://badexample.com/banner.gif or http://gooddomain.example/analyze?http://example.com/banner.gif.

If I understand this correctly, it should also block wss:www.example.com/banner.gif but in this implementation, it doesn't.

>>> from adblockparser import AdblockRules
>>> rules = AdblockRules(['||example.com/banner.gif'])
>>> rules.should_block('http://example.com/banner.gif')
True
>>> rules.should_block('http://www.example.com/banner.gif')
True
>>> rules.should_block('wss:example.com/banner.gif')
True
>>> rules.should_block('wss:www.example.com/banner.gif')
False

(should be True)

Aug 27 '21 18:08 MadDataScience

Oh, I just noticed it also doesn't block www.example.com/banner.gif (w/o the http://)

Aug 27 '21 18:08 MadDataScience

adblockparser adblockparser copied to clipboard

||domain.com should match wss:subdomain.domain.com (but it doesn't)

adblockparser
adblockparser copied to clipboard