FiltersRegistry icon indicating copy to clipboard operation
FiltersRegistry copied to clipboard

Privacy: ClearURLs

Open TPS opened this issue 1 year ago • 13 comments

Prerequisites

  • [X] I checked the documentation and understood it;
  • [X] I checked to make sure that this issue has not already been filed;

Problem description

The ClearURLs database might be be transformed into a powerful privacy-enhancing filterlist &/or userscript.

Proposed solution

The specs @ https://docs.clearurls.xyz/latest/specs/rules/ would be utterly necessary to transform this to something end-usable.

Additional information

Originally found via https://github.com/svenjacobs/leon/discussions/315#discussioncomment-9809441, where several interrelated projects are thinking of how to incorporate this database themselves.

TPS avatar Jun 23 '24 11:06 TPS

Alpha/Beta:

https://github.com/DandelionSprout/adfilt/blob/master/ClearURLs%20for%20uBo/clear_urls_uboified.txt https://raw.githubusercontent.com/DandelionSprout/adfilt/master/ClearURLs%20for%20uBo/clear_urls_uboified.txt

krystian3w avatar Jun 25 '24 20:06 krystian3w

Definitely also see https://github.com/DandelionSprout/adfilt/discussions/163

TPS avatar Jun 25 '24 21:06 TPS

Added years ago: https://github.com/AdguardTeam/FiltersRegistry/tree/master/filters/ThirdParty/filter_251_LegitimateURLShortener - https://github.com/AdguardTeam/FiltersRegistry/commit/65694c61d5fc8ea98782285edc14291c80d8c73a (https://github.com/AdguardTeam/FiltersRegistry/issues/401)

krystian3w avatar Jun 25 '24 21:06 krystian3w

I do use LUS, but am hoping to improve coverage for these trackers.

Not identical, ~~but now I think LUS is a derivative of ClearURLs (& probably other sources), so maybe this is duplicate in some sense?~~ If you'd comment on the relationship between the 2, @DandelionSprout, it'd help.

TPS avatar Jun 25 '24 23:06 TPS

Conflict of interest disclaimer: I am the assistant maintainer of the Actually Legitimate URL Shortener Tool, and current maintainer of the ClearURLs for uBo list (I did not create the original ClearURLs for uBo list; credit for that goes to rustysnake)

DandelionSprout's LUS is a derivative of ClearURLs (& probably other sources)

It is not. While a few filters have been copied from elsewhere (with credit), most have been manually added either based on user reports or tracking parameters Imre (and I) found. Thank you

iam-py-test avatar Jun 25 '24 23:06 iam-py-test

@iam-py-test Thanks very much for answering. 🙇🏾‍♂️ Could you comment on how different the contents of the 2 lists are from each other?

TPS avatar Jun 26 '24 00:06 TPS

The Actually Legitimate URL Shortener, as described, is a variety of rules manually added by Imre (DandelionSprout) and me. ClearURLs for uBo uses a Python script to convert the ClearURLs rules into a filterlist for uBlock Origin and AdGuard (basically what you requested here). There are a few modifications to remove problematic rules, but largely it's just the ClearURLs rules. Thanks

iam-py-test avatar Jun 26 '24 01:06 iam-py-test

In theory, I could potentially have attempted to merge relevant entries from ClearURLs into LUS, which I can only presume would be a win-win for most parties.

DandelionSprout avatar Jun 26 '24 01:06 DandelionSprout

@DandelionSprout 🙇🏾‍♂️ Actually, if the contents are that different, it'd make sense to keep them separate, & offer each as AG options to supplement each other & AG's other Privacy filterlists. OTOH, if the included rules overlap significantly, then it would make sense to use 1 as another source for the other, to keep down duplication.

TPS avatar Jun 26 '24 03:06 TPS

So, I ran a comparison this morning about whether ClearURLs had any coverage that LUS didn't. I decided to test with Amazon, a high-coverage site in both lists.

LUS had well above 80 entries for Amazon (70 of them being specific entries). Only 2 entries that made sense (e.g. not ones like keywords or _encoding) had been in ClearURLs but not in LUS.

Although I do have conflicts of interest in the matter, I'd say that at this point ClearURLs has been obliterated in comparison. I give iam-py-test full 100% rights to make the calls on the following, with no interference from me, but I personally am getting unsure if a ClearURLs list conversion would be considered necessary nowadays. 😓

DandelionSprout avatar Jun 26 '24 07:06 DandelionSprout

That's reasonable methodology. Possible to be more comprehensive over domain variety, like this is for TLD variety? I've a hunch that far-less-well-known sites than Amazon may have wider coverage on ClearURLs.

TPS avatar Jun 27 '24 11:06 TPS

Possible to be more comprehensive over domain variety, like https://github.com/StevenBlack/hosts/issues/1181#issuecomment-608229213?

Given both lists have many global (applies to all websites) rules, measuring such coverage would be difficult.

iam-py-test avatar Jun 27 '24 13:06 iam-py-test

It is definitely worth testing which permissions deactivate the global removeparam (AdGuard only):

removeparam rules can also be disabled by $document and $urlblock exception rules. But basic exception rules without modifiers do not do that. For example, @@||example.com^ will not disable $removeparam=p for requests to example.com, but @@||example.com^$urlblock will.

Then the script "user.js" with API to edit parameters will probably work better on locked ranges.

https://adguard.com/kb/general/ad-filtering/create-own-filters/#urlblock-modifier

krystian3w avatar Jun 27 '24 17:06 krystian3w

Hi! According to our rules, it should be the filter that oriented towards browser content blockers as mentioned Legitimate URL Shortener here.

zloyden avatar Dec 06 '24 16:12 zloyden

As currently one list pulls in 100% of the rules of the other is a bit like that (I have not checked how it is done, for example, with the reduction of duplicates on the script side before publishing the list update).

The only thing that worries me is something like the mode of deactivation of cosmetic filters on https-sensitive sites - here with rules we can also deactivate the removal of parameters completely.

krystian3w avatar Dec 06 '24 19:12 krystian3w