adblockparser
adblockparser copied to clipboard
||domain.com should match wss:subdomain.domain.com (but it doesn't)
The regex at https://github.com/scrapinghub/adblockparser/blob/4089612d65018d38dbb88dd7f697bcb07814014d/adblockparser/parser.py#L264 appears to be too restrictive. According to https://help.eyeo.com/en/adblockplus/how-to-write-filters#anchors
You might want to block http://example.com/banner.gif as well as https://example.com/banner.gif and http://www.example.com/banner.gif. You can do this by putting two pipe symbols in front of the filter. This ensures that the filter matches at the beginning of the domain name: ||example.com/banner.gif, and blocks all of these addresses while not blocking http://badexample.com/banner.gif or http://gooddomain.example/analyze?http://example.com/banner.gif.
If I understand this correctly, it should also block wss:www.example.com/banner.gif
but in this implementation, it doesn't.
>>> from adblockparser import AdblockRules
>>> rules = AdblockRules(['||example.com/banner.gif'])
>>> rules.should_block('http://example.com/banner.gif')
True
>>> rules.should_block('http://www.example.com/banner.gif')
True
>>> rules.should_block('wss:example.com/banner.gif')
True
>>> rules.should_block('wss:www.example.com/banner.gif')
False
(should be True
)
Oh, I just noticed it also doesn't block www.example.com/banner.gif
(w/o the http://
)