adblock-rust icon indicating copy to clipboard operation
adblock-rust copied to clipboard

When source_url is missing, positive plus negative domain constraints would make the rule to always match

Open wavenator opened this issue 4 years ago • 4 comments

Calling check_network_urls with an empty source_url causes domain constrained rules, that has both negative plus positive, to match no matter what. The problem is in check_options function in network.rs. The following code:

if let Some(included_domains) = filter.opt_domains.as_ref() {
    if let Some(source_hashes) = request.source_hostname_hashes.as_ref() {
        // If the union of included domains is recorded
        if let Some(included_domains_union) = filter.opt_domains_union {
            // If there isn't any source hash that matches the union, there's no match at all
            if source_hashes.iter().all(|h| h & included_domains_union != *h) {
                return false
            }
        }
        if source_hashes.iter().all(|h| !utils::bin_lookup(&included_domains, *h)) {
            return false
        }
    }
}

if let Some(excluded_domains) = filter.opt_not_domains.as_ref() {
    if let Some(source_hashes) = request.source_hostname_hashes.as_ref() {
        // If the union of excluded domains is recorded
        if let Some(excluded_domains_union) = filter.opt_not_domains_union {
            // If there's any source hash that matches the union, check the actual values
            if source_hashes.iter().any(|h| (h & excluded_domains_union == *h) && utils::bin_lookup(&excluded_domains, *h)) {
                return false
            }
        } else if source_hashes.iter().any(|h| utils::bin_lookup(&excluded_domains, *h)) {
            return false
        }
    }
}

Imagine the following use case

|http://$image,domain=domain.com|~sub.domain.com

This rule should match on images, coming from the domain domain.com with an exclusion for sub.domain.com. The following check should pass, but it will block:

adblocker.check_network_urls(
    url="http://some-website.com/image.jpg",
    source_url="",
    request_type="image",
)

The solution should be and exclusion in the if statement. If there are domains inside filter.opt_domains and there is no request.source_hostname_hashes, it should never match.

wavenator avatar Oct 11 '20 15:10 wavenator

BTW, after fixing this bug I found another bug. Fixing this bug made check_ws_vs_http_matching failing. The test looks like this:

let network_filter = NetworkFilter::parse("|ws://$domain=4shared.com", true).unwrap();
assert!(network_filter.matches(&request::Request::from_urls("ws://example.com", "4shared.com", "websocket").unwrap()));

Changing "4shard.com" to http://4shared.com fixed the problem. Now there is a question here. Does a WebSocket send an http referrer in a URL format or it is just a domain. If it will always send a URL, we should fix the tests to a URL rather than a domain. If there is a possibility they send domains, there is a problem with parsing the source_url from the domain (which is reasonable), and we need to fix that. Either way, I think we should add negative tests there to show that there is a bug right now. As a piece of evidence, the following code passes.

let network_filter = NetworkFilter::parse("|ws://$domain=4shared.com", true).unwrap();
assert!(network_filter.matches(&request::Request::from_urls("ws://example.com", "3shared.com", "websocket").unwrap()));

wavenator avatar Oct 11 '20 16:10 wavenator

It raises another question. Now that I think about it, why do we even compare it against the source_url and not against the URL of the request? When the request_url is an absolute URL, the comparison should be against the URL and not against the source URL.

wavenator avatar Oct 12 '20 13:10 wavenator

Good catches, That check_ws_vs_http_matching test is definitely wrong, those need to be full URLs.

If I understand your last question correctly, we need source_url because there are some rules which are applied differently depending on the page the request is made from. For example, ||a.com$domain=b.com will only block requests to a.com made from b.com. It's also taken into account for $first-party and $third-party options.

Since check_network_urls really depends on source_url to make a correct decision, there should realistically be an error returned if it's not supplied, or doesn't parse correctly.

I'm not actually that familiar with WebSocket referrers, but it shouldn't matter - source_url should just be the top-level URL of the page from which the WebSocket was initiated.

antonok-edm avatar Oct 29 '20 20:10 antonok-edm

If you need any help with fixing the bug, I'd be happy to help.

wavenator avatar Oct 30 '20 08:10 wavenator