requestcontrol icon indicating copy to clipboard operation
requestcontrol copied to clipboard

Set of default filters to increase privacy

Open ArenaL5 opened this issue 5 years ago • 47 comments

Post edited to clarify. I hadn't even said what is this for.

I'm planning on installing Request Control in several devices (all of them using Firefox), and I think it could benefit from a good set of default rules to increase privacy.

My scenario is: several platforms transform links in comments to redirectors, e.g. YouTube (any link to sites other tan YouTube will point https://www.youtube.com/redirect?q=ACTUAL_TARGET_URL) and Disqus. This links, supposedly created to increase privacy, may track your movement from site to site and/or inform targeted ads agencies.

Also, I'm under the impression that Request Control already strips URLs from &utm metadata, but there are other sites that use different codes.

My plan is to write rules to avoid these redirectors, to allow users like myself to install Request Control, subscribe to this list, and then forget about it. Kind of like uBlock Origin works actually.

I know that you haven't been interested in this...

[The default rules are just examples. You're supposed to create your own rules.] (https://github.com/tumpio/requestcontrol/issues/106#issuecomment-523075116)

but I'd nevertheless like to publish these rules somewhere, probably FilterLists. Would you be interested in bundling these rules in your project, or linking to them somewhere in your add-on / README.md?

(I originally suggested to add a mechanism to subscribe to filters, but it doesn't really matter much... all I'd had to do is to download the list locally and import it, still pretty convenient.)

ArenaL5 avatar Dec 19 '19 21:12 ArenaL5

The filter list, for the moment, is this. I added some of my own filters on top of your default filters, plus the extra files suggested by @AreYouLoco in #110

To clarify, I'm interested in publishing them either way, but I'd like to know who would manage them. Would you rather have me craft a pull request, or have me upload them to FilterLists, or upload them and manage them yourselves?

I'd like @AreYouLoco's opinion as well.

ArenaL5 avatar Feb 04 '20 16:02 ArenaL5

@ArenaL5, as is, if I try to import your rules, I see the following error:

SyntaxError: JSON.parse: expected double-quoted property name at line 285 column 3 of the JSON data

geeknik avatar Feb 04 '20 16:02 geeknik

Yes, I'm sorry for that. I made a very simple edit without testing it and accidentally added a comma that doesn't belong.

I can import it now with no errors. Please try again.

ArenaL5 avatar Feb 04 '20 16:02 ArenaL5

Well my opinion is following: I am not a developer of this addon so I would leave ultimate power of final decistion how to manage stuff to @tumpio.

But since its opensource why not contribute already. I think the easiest way would be to merge my PR first so others using this addon can benefit from new rules and then @ArenaL5 you should make your PR with your cool filters on the top of master so its one click merge for @tumpio.

And good future policy would be to make CONTRIBUTING file with instructions how to contribute to the project.

AreYouLoco avatar Feb 04 '20 16:02 AreYouLoco

Some more query parameters that can be useful to remove: https://github.com/Smile4ever/Neat-URL#blocked-parameters

ts1a avatar Feb 04 '20 18:02 ts1a

@ts1a, thank you! I've added them to the list just now.

ArenaL5 avatar Feb 04 '20 23:02 ArenaL5

Thank you for sharing these rules! I'm unable to maintain them myself but I can promote it so that other users can find them. I think it could be added as an opt-in rule set that new users can load when Request Control is installed. There would also need to be a mechanism to load possible updates to the rules when extension is updated, which there isn't currently, and also take in account if they can be editable.

tumpio avatar Mar 28 '20 16:03 tumpio

I'm so sorry for the delay! I've been very busy these days and I almost had forgotten about this list.

I'm glad you like them. Yes, those mechanisms will be necessary, especially the update system. Filters already have IDs, so it should be possible just to compare the same filter in the old and new list, and prompt the user to choose. A bit crude, but mostly manageable, and you wouldn't need to add more data to the JSON file.

Or you could add a flag to mark which ones were not edited by the user. Those can be safely rewritten without warning.

I'm not familiar with the workings of webExtensions, so I can't help with that myself, but I can keep sending filters in the meanwhile.

ArenaL5 avatar Apr 11 '20 00:04 ArenaL5

@ArenaL5, here's a new rule to strip tracking parameters from .gif files blatantly taken from https://github.com/0x01h/gif-tracking-protection because I don't really want to install another extension.

[
  {
    "uuid": "6468e7a9-440d-4727-a09e-c1a5cc386948",
    "pattern": {
      "scheme": "*",
      "host": [
        "*"
      ],
      "path": [
        "*.gif*"
      ]
    },
    "action": "filter",
    "active": true,
    "skipRedirectionFilter": true,
    "trimAllParams": true,
    "title": "Filter%3A%20GIF%20Tracking%20Parameters"
  }
]

geeknik avatar May 04 '20 14:05 geeknik

Added as commit 40503a616e2bc7f567d3ed2552ab67f01b7acced. Thank you so much. :relaxed:

By the way, I think I noticed a weird error in the webextension when testing your rule. As far as I can tell, it shouldn't happen in real webpages, but:

duplicatedhash

This only happens if the rule is set to cut any and all parameters (after ?). If you set it to cut only some of them, or to use reverse cut, it works normally.

ArenaL5 avatar May 04 '20 18:05 ArenaL5

There are not only GIFs... check also the images at https://www.24ur.com/ 😉

Cheers

crssi avatar May 05 '20 04:05 crssi

Good catch! I just added your suggestion: commit 2755765c3a11b71dd582384bcab7b3c3644955be

It now filters URLs with .png?, .gif?, .jpg?, .jpeg? and .webm?, case-insensitive. I didn't include SVG or obscure formats because I don't know of any website which uses them for tracking.

Tell me if I missed anything!

ArenaL5 avatar May 14 '20 17:05 ArenaL5

Tell me if I missed anything!

Page like https://www.kickstarter.com/ 😉

crssi avatar May 14 '20 17:05 crssi

Something like this maybe?:

[
  {
    "uuid": "de529bb9-2ec2-41ac-800f-d1cb8247f622",
    "pattern": {
      "scheme": "*",
      "host": [
        "*"
      ],
      "includes": [
        "/\\.(gif|jpg|jpeg|png|webp)\\?/"
      ],
      "path": [],
      "allUrls": true
    },
    "action": "filter",
    "active": true,
    "paramsFilter": {
      "values": [
        "auto",
        "crop",
        "fit",
        "frame",
        "h",
        "q",
        "s",
        "w"
      ],
      "invert": true
    },
    "title": "Filter tracking parameters in images",
    "tag": "filter-img",
    "types": [
      "image"
    ]
  }
]

crssi avatar May 14 '20 18:05 crssi

^^ but performance wise, I would skip the following out:

      "includes": [
        "/\\.(gif|jpg|jpeg|png|webp)\\?/"
      ],

crssi avatar May 14 '20 18:05 crssi

I saw your new messages while I was typing mine. I was thinking of a different approach, but maybe we'll need to combine both. I'm only a bit worried about obscured metadata parameters.

I investigated a bit and found out that Kickstarter depends on Imgix to serve images. Imgix is a web service that describes itself like this:

Powerful image processing, simple API imgix transforms, optimizes, and intelligently caches your entire image library for fast websites and apps using simple and robust URL parameters.

So great start. The whole service runs counter against this filter. Isn't that annoying hahah.

The best option seems to include your parameter whitelist, and also to whitelist imgix.net altogether, but I'm not very sure about s... and I worry about them having metadata parameters in their URL. (I couldn't find the meaning of s in their public API)

^^ but performance wise, I would skip the following out:

      "includes": [
        "/\\.(gif|jpg|jpeg|png|webp)\\?/"
      ],

We can do without the regexp, certainly. I just noticed we can apply filters only to certain types of content (I had forgotten about it... :sweat_smile:).

I've done a couple tests and creating a wild-card filter that only filters images seems to work pretty well:

reply

ArenaL5 avatar May 14 '20 18:05 ArenaL5

Or we can go with your approach, and whitelist imgix.net in a different rule. Maybe it makes what's happening more obvious to the user.

I'd still insist on having some provision for imgix.net, and enabling it by default, because it has like a hundred different parameters and disabling it by default will prevent you from seeing some images as intended.

ArenaL5 avatar May 14 '20 18:05 ArenaL5

It might be necessary to exclude these values from trimming:

format
size
height
width

geeknik avatar May 14 '20 22:05 geeknik

Alright.

So, taking everything into account... what do you both think of this?

[
  {
    "title": "Filter tracking parameters in images",
    "uuid": "6468e7a9-440d-4727-a09e-c1a5cc386948",
    "pattern": {
      "scheme": "*",
      "host": [
        "*"
      ],
      "path": [
        "*"
      ],
      "excludes": [
        "http://*.imgix.net/",
        "https://*.imgix.net/"
      ],
      "allUrls": true
    },
    "action": "filter",
    "active": true,
    "description": "Inspired by GIF Tracking Protection webextension",
    "skipRedirectionFilter": true,
    "tag": "filter-img",
    "types": [
      "image"
    ],
    "paramsFilter": {
      "values": [
        "auto",
        "crop",
        "fit",
        "format",
        "frame",
        "h",
        "height",
        "q",
        "s",
        "size",
        "w",
        "width"
      ],
      "invert": true
    }
  }
]

ArenaL5 avatar May 14 '20 22:05 ArenaL5

We should probably whitelist https://www.google.com/recaptcha/api2/payload*.

This rule turns https://www.google.com/recaptcha/api2/payload?p=xxx&k=xxx into https://www.google.com/recaptcha/api2/payload thus breaking half the Internet which relies on that garbage. Should investigate the hCaptcha implementation as well.

geeknik avatar May 14 '20 22:05 geeknik

An unintended consequence of this rule is increased privacy on YouTube as this rule trims https://www.youtube.com/api/stats/watchtime?ns=yt&el=detailpage&cpn=VIDEO_ID&docid=DUNNO&ver=2&referrer=OBVIOUS&cmt=33.274&fmt=137&fs=0&rt=1316.676&of=NOT_SURE&euri&lact=8301&cl=SOMETHING&state=paused&vm=SOMETHING_ELSE&volume=17&subscribed=1&cbr=Firefox&cbrver=78.0&c=WEB&cver=2.20200514.04.00&cplayer=UNIPLAYER&cos=X11&autoplay=1&cr=US&uga=o50&len=671.777&rtn=1356&afmt=140&idpj=-5&ldpj=-30&rti=1316&muted=0&st=25.695&et=33.274 down to https://www.youtube.com/api/stats/watchtime thus denying the Google machine valuable information about what I'm doing. I haven't noticed any differences in video quality or playback. YMMV

geeknik avatar May 14 '20 23:05 geeknik

Wow, there are a lot of legitimate uses for URL parameters in images.

Whitelisting reCaptcha yes, it's absolutely necessary. The hCaptcha test in their webpage seems to work even when a couple of requests are redirected... but I'd need a different page that implements hCaptcha to confirm it.

Another exception: businesses owned by Facebook (Facebook, Instagram, WhatsApp) use an Imgix-like system to put timestamps and hashes in their images. We need to whitelist them so we can see any photo at those sites.

_nc_ht, _nc_ohc, oe, oh, ~url~ seem to work well for Instagram and Facebook. For Whatsapp Web, ~either whitelisting the e parameter or the https://web.whatsapp.com/pp?e= URL fragment should work.~ there's a better option: create a new rule to force URL filtering for https://web.whatsapp.com/pp?e=. That way, the result of that will be filtered again.

The only complication is that the URL filtering needs to be done before trimming URL parameters, but that can be done by whitelisting https://web.whatsapp.com/. The filtered URL begins with https://pps.whatsapp.net/, which is not whitelisted, so it will be filtered again.

That was confusing. Right now I'm using these test rules:

[
  {
    "title": "Filter tracking parameters in images",
    "uuid": "6468e7a9-440d-4727-a09e-c1a5cc386948",
    "pattern": {
      "scheme": "*",
      "host": [
        "*"
      ],
      "path": [
        "*"
      ],
      "excludes": [
        "http://*.imgix.net/",
        "https://*.imgix.net/",
        "https://web.whatsapp.com/"
      ],
      "allUrls": true
    },
    "action": "filter",
    "active": true,
    "description": "Based on GIF Tracking Protection webextension",
    "skipRedirectionFilter": true,
    "tag": "filter-img",
    "types": [
      "image"
    ],
    "paramsFilter": {
      "values": [
        "_nc_ht",
        "_nc_ohc",
        "auto",
        "crop",
        "fit",
        "format",
        "frame",
        "h",
        "height",
        "oe",
        "oh",
        "q",
        "size",
        "w",
        "width"
      ],
      "invert": true
    }
  },
  {
    "uuid": "297fb0c7-d052-4031-9947-fc7a9b7690af",
    "pattern": {
      "scheme": "*",
      "host": [
        "web.whatsapp.com"
      ],
      "path": [
        "/pp?*"
      ]
    },
    "types": [
      "image"
    ],
    "action": "filter",
    "active": true,
    "title": "WhatsApp Web images"
  }
]

An unintended consequence of this rule is increased privacy on YouTube as this rule trims https://www.youtube.com/api/stats/watchtime?ns=yt&el=detailpage&cpn=VIDEO_ID&docid=DUNNO&ver=2&referrer=OBVIOUS&cmt=33.274&fmt=137&fs=0&rt=1316.676&of=NOT_SURE&euri&lact=8301&cl=SOMETHING&state=paused&vm=SOMETHING_ELSE&volume=17&subscribed=1&cbr=Firefox&cbrver=78.0&c=WEB&cver=2.20200514.04.00&cplayer=UNIPLAYER&cos=X11&autoplay=1&cr=US&uga=o50&len=671.777&rtn=1356&afmt=140&idpj=-5&ldpj=-30&rti=1316&muted=0&st=25.695&et=33.274 down to https://www.youtube.com/api/stats/watchtime thus denying the Google machine valuable information about what I'm doing. I haven't noticed any differences in video quality or playback. YMMV

This is glory.

ArenaL5 avatar May 14 '20 23:05 ArenaL5

For Instagram, we might be able to get away with just whitelisting scontent-*-*.cdninstagram.com. An example hostname is scontent-lga3-1.cdninstagram.com.

geeknik avatar May 14 '20 23:05 geeknik

That will work, but requests to that server will still have some optional filler. I think whitelisting URL parameters would be better.

ArenaL5 avatar May 14 '20 23:05 ArenaL5

I see where Instagram uses _nc_ht, _nc_ohc, oh, and oe however using the current rules as they are, those are stripped away and images still load when I'm logged in and logged out.

The only place I'm not seeing images load are user avatars in private messages.

So apparently to view a user's avatar, you need these:

_nc_ht = URL signature
_nc_ohc = URL signature
oe = URL timestamp
oh = URL hash

Example: https://scontent-lga3-1.cdninstagram.com/v/t51.2885-19/s150x150/84176507_208023943903904_2275778092112805888_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_ohc=90_1aiR5QhMAX_aZWG-&oh=a4c150c318a64aebac4d56a57c9ddde3&oe=5EE7AA91

Remove any of those values and you can see the error.

geeknik avatar May 14 '20 23:05 geeknik

Yes, that's correct. I tested it visiting empty, random Instagram accounts while logged out, so I didn't notice you always can see photos.

But in Facebook's case, I can't see anything unless I whitelist all of those.

ArenaL5 avatar May 14 '20 23:05 ArenaL5

Some reading to digest: https://github.com/ghacksuserjs/ghacks-user.js/issues/149

crssi avatar May 15 '20 07:05 crssi

Some reading to digest: https://github.com/ghacksuserjs/ghacks-user.js/issues/149

That's a lot of rules to read, test, and transcribe calmly and slowly. I'll give it a look.

For the moment, I pushed a new commit (237b045872e5f24ea872a897eef87f43dddf8cf0). It includes this code sample and an exemption for ReCaptcha.

ArenaL5 avatar May 15 '20 16:05 ArenaL5

Food Network has some awful URLs in their newsletter:

https://www.foodnetwork.com/recipes/food-network-kitchen/grilled-white-pizza-with-fennel-salad-3364631?%24deeplink_path=itk%2Fv4%2Fclasses%2Fdc70268e-607f-40d5-b47d-7faef3bade56&%24deep_link=true&nl=FNKS_052020_threeup-2&campaign=FNKS_052020_threeup-2&bid=20370414&c32=c6820913aafbc64fbaa51c7f89257b80b2fd7d321&ssid=FNK_Registration_Event&sni_by=&sni_gn=&%243p=e_sailthru&%24original_url=https%3A%2F%2Fwww.foodnetwork.com%2Frecipes%2Ffood-network-kitchen%2Fgrilled-white-pizza-with-fennel-salad-3364631%3F%2524deeplink_path%3Ditk%2Fv4%2Fclasses%2Fdc70268e-607f-40d5-b47d-7faef3bade56%26%2524deep_link%3Dtrue%26nl%3DFNKS_052020_threeup-2%26campaign%3DFNKS_052020_threeup-2%26bid%3D20370414%26c32%3Dc682013aafbc64fbaa51c7f89257b80b2fd7d321%26ssid%3DFNK_Registration_Event%26sni_by%3D%26sni_gn%3D&_branch_match_id=791768214211048437

Loading https://www.foodnetwork.com/recipes/food-network-kitchen/grilled-white-pizza-with-fennel-salad-3364631 works just fine, so all that other tracking nonsense can go.

geeknik avatar May 20 '20 21:05 geeknik

Sorry, I haven't been checking this lately...

Your URL is messed up. I'm not sure how would we program a rule:

  • Truncate everything after ?, only for foodnetwork.com?
  • Blacklist deep_link and deeplink*?
  • Redirect to original_url? (seems like it might break some pages, especially after a login screen).

Also, they changed the page, now both links throw me a 404 error. Could you post a different link?

ArenaL5 avatar May 29 '20 11:05 ArenaL5

@ArenaL5 Hi, I hope you're well. I am trying to find your set of filters but the link (https://gist.github.com/ArenaL5/21832548f9f56c867cf7527688814bb0) gives a 404. I first found my way here using google (last resort) searching for a disq.us filter or redirector. The url tracking they do is really getting on my nerve with its added - extra slow - loading time.

Funny thing is that I tried "Skip Redirect", NeatURL, and ClearURL. Neither of them seem to filter or redirect disq.us.

If you need an example of a disqus link I mean the padding disqus adds when someone posts a link in a post.

Looks like this: https://disq.us/url?url=https://example.org/some/site/:tF3yYzFf7r84F4MitftbeTekwLg&cuid=2271717

I only want to go to exmaple.org/some/site instead of being routed through their tracking system. Hope to hear from you or anyone still looking in here.

ThurahT avatar Jan 29 '21 19:01 ThurahT

@ThurahT Thank you for your concern. I'm very fortunately doing alright; I hope you are, too.

The filter you're looking for is already part of Request Control's defaults and should work with your example URL.

I deleted the GitHub Gist because it was obsolete and @tumpio already merged my latest updates into the master branch (along with updates from other contributors). To use the newest filters, you can either build Request Control from source, or import the 13 rulesets at this folder.

If you're only adding filters for Disqus, you might be interested in this one as well (it's in the same file); it downloads embedded images from their original location instead of Disqus's servers.

Feel free to report back if you have any problem or suggestion.

ArenaL5 avatar Jan 30 '21 19:01 ArenaL5

Thanks a bunch, @ArenaL5! I finally found my sole and only redirector. RC is so much more though, so I am grateful. Just need to learn js and regexp and I should be able to create my own rules in the future. But until that distant future, this is great!

I was confused about the default ruleset since it looks like this in my RC from AMO (1.15.5) : https://i.imgur.com/K6tqu8w.png But I see now that I can import all those rulesets from the repo. Really nice. Thanks again!

ThurahT avatar Jan 31 '21 03:01 ThurahT

Glad I could help!

Also, you don't need to know JavaScript to create your own rules. Regexps, on the other hand... you should, but the basics are much easier than you think. You can always test your rules with the Test selected button anyways.

EDIT: To be precise, the rulesets can't have any JavaScript. JavaScript would be useful if you want to improve Request Control itself, or for other addons.

ArenaL5 avatar Jan 31 '21 15:01 ArenaL5

filter-images-4 breaks the thumbnails at the top of an Instagram profile.

2021-06-15_08-58

geeknik avatar Jun 15 '21 13:06 geeknik

I'm sorry ― it's been 12 days already??

I got the email a while ago but I've been unusually busy these days, so I couldn't get to it. I'm writing a PR to fix this and other issues.

Three parameters are being filtered right now: tp, ccb and _nc_sid. Oddly, if you copy the filtered URL, it loads the image correctly, but it doesn't work on the Instagram profile; I reckon Instagram checks the signature of every image it loads. (Either that, or there's something I'm missing.)

For the moment, I'm whitelisting all three parameters: b25e9627f4a6f12d92fd9293337019bf16a42202

ArenaL5 avatar Jun 27 '21 22:06 ArenaL5

Your guess is as good as any, I could not figure out a combination that would load the images. What can ya do? 🤷🏻‍♂️

geeknik avatar Jun 27 '21 22:06 geeknik