CleanLinks [Sticky: Please read first] Broken website / login ! Now what ?

Is there a website that is not working anymore? Maybe reloading infinitely? Here’s what you can do:

1. Whitelist the URL

Open the CleanLinks menu by clicking on the toolbar button,
Search for the problematic link (possibly filtering with “Embedded Link” only),
Click the “Whitelist Embedded URL” button.

You’re done ! In very few cases there might be multiple requests redirecting to the same page.

2. Consider contributing the cleaning/whitelisting rule

This is important as CleanLinks has no telemetry at all, not even anonymous, and I don’t visit every website of the internet. So I can’t possibly gather all of the needed information to build the perfect cleaning rules for everyone!

Do you think the website is used by many people, or could be useful to the wider community as a default rule? Please open an issue or post it here as a comment and I’ll try to integrate it. What I ned to know is:

on which pages does the problem happen?
which parameters should be removed or whitelisted for the website to work?

You can search for the rule by filtering by website in CleanLink’s configuration page, or in the rules file that you can export from that page.

Mar 15 '20 12:03 Cimbali

I'm sorry, but I don't understand the interface at all. Check boxes are in the dropdown menu, in the settings interface, and probably in the secret settings panel that I haven't found. I've tried using the whitelist button several times; it doesn't help. I tried looking at the exported JSON settings - maybe I could make sense of that - no. At the very place where the settings lists items that are going to be cleaned/changed/whatever, there is no way to change them. I can add items, but the huge list that's already there is immutable.

I'm ready to give up.

Mar 26 '20 09:03 Grossdm

Hi @Grossdm, sorry for the confusion. There’s no secret settings panel. There are a few help tooltips that should guide you on what the interfaces do, and also a github wiki to explain what CLeanLinks does and what the interfaces are. Below is a summary of that information.

In the popup:

Check boxes (soon to be restyled #87) in the dropdown menu filter the history of cleaned links that is displayed there
Clicking a cleaned link in the history selects in
Buttons on the bottom modify rules for that selected link.

In the settings:

Greyed items are inherited from parent rules.
To counter-act a greyed out item, either
- add the same item to the whitelist of the rule you’re editing − this overrules the removals, or
- go to the parent rule (there will be a quick link for that soon, see #87 too − in general it’s just *.* anyways).
Rules need to be saved after being modified

The exported json URLs are matched hierarchically:

every .-key is a part of the domain that is being matched,
every other key is a regular expression matching the URL’s path,
the actions key displays which cleaning actions to take. The import/export function is more meant for backing up your rules set than to edit them manually.

Mar 26 '20 10:03 Cimbali

Here is what I think about the recent changes.

Extension become total annoyance rather than helper. I'm afraid that I will to remove it from my browser because it doesn't work anymore as it should. And here is the list of things which totally piss me off:

The worst thing is that on some sites I can't login at all. And this shouldn't be the user's problem to whitelist or add exceptions at all. If you don't agree with that then you shouldn't make such extensions and expect people in great numbers to enjoy using them. Also I don't have time to deal with this crap, it's easier for me to just disable extension than wasting time on research of the problem.
Not always links are cleaned, when I expect it. This is amusing really, because copying link through context menu cleans them without issues.
In the past it was possible to disable extension by clicking its icon, so I could deal with infuriating annoyances temporarily and then switch it on later when I need it, but now this functionality is totally broken. Extension switches on every time I close current tab. WTF? You think this is normal? I don't agree with you if yes.
There is no option to properly adjust extension's behavior for the user's needs, settings are sparse and mostly useless. Also I see, that in the latest version you added new functionality to disable some of the functions, but they aren't saved when I close or even refresh the tab. This is just dumb, I can't describe it otherwise. Then what's the use of those?

So with all that I think, I will just delete this extension after many years of usage and going to look for alternatives. And you just ruined it because it seems that you don't understand the philosophy of usability. In any piece of software usability must be the first priority, and most importantly you don't have the right to decide what is good and what is bad for the end user, your role as a developer is to give the user tools for making such decision.

Mar 29 '20 23:03 RazielZnot

@RazielZnot sorry to hear that. You can always switch to pre-v4 versions through the versions list.

A lot of the recent changes have been to give the user more control over what the extension does.

some sites I can't login at all
Not always links are cleaned

If you report which sites, I can respectively add them in the default whitelist and look at them to see why they are not cleaned.

this shouldn't be the user's problem to whitelist or add exceptions at all

I already needed to do this on the previous version of CleanLinks. Apparently you disabled CleanLinks when it broke websites instead of adding them to the exceptions list. Ideally we wouldn’t need any of those things, I agree with you, but writing rules has to be a community effort, I can’t visit all websites on the web.

Extension switches on every time I close current tab

The extension on/off toggle is now per-tab instead of global. If you switch it off and change tabs. you’ll need to switch it off in the tab to which you switched. It’ll still be off in the tab you left when you come back to it.

you added new functionality to disable some of the functions, but they aren't saved when I close or even refresh the tab.

I’m not sure to which function you’re referring. If you’re editing a rule in CleanLinks options, make sure to click the “Save” button. I’m afraid it’s very hard to make an automatically-saving rule editor ¹.

software usability must be the first priority
you don't have the right to decide what is good and what is bad for the end user

The user now has more fine-grain control over what the add-on does. In particular,

the add-on can now clean tracking from links, which it couldn’t before.
the user can now whitelist or remove legitimate use cases for embedded URLs, whereas before the user could only whitelist a whole domain.

I’m sorry if there is a bit of a learning curve for the new version. I’ve started writing the github wiki pages to help people use it. I’m not a big company doing this for money or widespread use − this is an add-on I found useful that died when Firefox switched over to webextensions exclusively. I’ve rewritten it from scratch and I’m maintaining it in my spare time because I find it useful. It’s up to you whether you want to use it or not.

However, what would help improve CleanLinks a lot are contributions (this is an Open Source project: fork it, modify it, let’s work together and integrate your improvements to CleanLinks!) or specific feedback on usability issues.

1: In case the new rule already exists we need to merge it. However if it’s an intermediate state while you’re still editing the rule, then how do we un-merge it?

Mar 30 '20 09:03 Cimbali

I’ve minimised previous comments as they were about general usability of the add-on. Please open a new issue (or comment on one that already exists and is on that topic) for such problems.

In this issue, users can report which websites/parameters for review to be included in the whitelist.

Please report, for every proposed entry:

domain/path + parameter (or path) to whitelist (or remove)
example link which fails to load
page on which such a link can be found
any further comments
why should it be included

I’ll try to get around to each suggestion and test it and see if it should be included.

For example for this report from a different bug:

On invidio.us/get_video_info the eurl parameter contains the current URL which causes the load to fail
example link: https://invidio.us/get_video_info?html5=1&video_id=tyTTVuG6vcw&cpn=IaObfF5H62G6D93_&eurl=https%3A%2F%2Fwccftech.com%2Fcrytek-invites-developers-to-try-out-cryengine-on-android%2F&el=embedded&hl=en_US&sts=18353&lact=10&c=WEB_EMBEDDED_PLAYER&cver=20200404&cplayer=UNIPLAYER&cbr=Firefox&cbrver=73.0&cos=X11&width=740&height=416&ei=QueMXoS2H-fDxgL2k5-wAg&iframe=1&embed_config=%7B%7D
example page: https://wccftech.com/crytek-invites-developers-to-try-out-cryengine-on-android/
comments: invidio.us is not directly embedded in the page, but it’s a privacy proxy for youtube. Some addons redirect youtube requests to invidio.us.
invidio.us is maybe not very widely used, but it’s targetted at privacy-conscious, so CleanLinks users might probably be using it.

Apr 07 '20 20:04 Cimbali

Parameters which are being redirected here might of interest to you:

https://gbhackers.com/facebook-tried-to-buy-nso-spyware/

One thing I want to say @Cimbali is that you should take my suggestions only after giving it a more thorough inspection. The parameters which I report with some doubt like that "domain" is maybe used in some legitimate cases so I'm not completely sure if it should completely be ignored that's why I specified "in coordination with /auth"(I also found "d" & "D" used in place of "domain" in some cases). The parameters which can easily be seen breaking the stuff without a doubt like those of instagram, soundcloud, twitter etc can obviously be made a easy decision upon. So please keep checking them first on your machine as well.

Originally posted by @Rtizer-9 in https://github.com/Cimbali/CleanLinks/issues/106#issuecomment-610512203

No worries @Rtizer-9 I do check every suggestion before including it. I think it makes sense to be rather lenient on the login rules though. The only reason I decided not to go for whitelisting all parameters (i.e. .+) on these pages is that whitelists always override remove lists. Since the login rule has no domains specified, I think we should minimise its potential side-effects. Say a page that has /sso/ in its path but does nothing with logins, which is unlikely, then we only whitelist some of its parameters instead of all of them.

Apr 08 '20 10:04 Cimbali

On Facebook, when you click on "see more posts", CL takes the click to an intermediate blank page. I've tried to reproduce it after disabling CL and the error goes away. The url is something like facebook.com/ajax/feed/substories...
Also, when you are viewing a post which has "see more" option and you click on it, the page scrolls to the top. The redirection here is of the pattern facebook.com/profileidhere ---> sameurl with #

Apr 09 '20 19:04 Rtizer-9

I think the facebook links should be fixed from the new release. I didn’t have time to give those examples specifically though, but the new version is a fix for # links.

Apr 11 '20 19:04 Cimbali

The instagram rule doesn't work in the following case.

https://indianexpress.com/article/trending/viral-videos-trending/bengaluru-techies-quirky-quarantine-dance-leaves-netizens-laughing-out-loud-6359093/

Url is

https://www.instagram.com/p/B-oVU1eJZDE/embed/?cr=1&v=12&wp=500&rd=https%3A%2F%2Findianexpress.com&rp=%2Farticle%2Ftrending%2Fviral-videos-trending%2Fbengaluru-techies-quirky-quarantine-dance-leaves-netizens-laughing-out-loud-6359093%2F#%7B%22ci%22%3A0%2C%22os%22%3A3800%7D https://indianexpress.com/

Apr 12 '20 19:04 Rtizer-9

I was trying to solve above instagram breakage using different regex and got success in few cases but the .+ or other were not working for every case. In the end while I accidentally hovered over ?, it was written that to match all the paths leave it empty and it worked really.

Thought I should let you know.

Apr 19 '20 10:04 Rtizer-9

.+ matches but there must be something to match. .* allows to match even if it is empty (. = any character, * = 0 or more, + = 1 or more). The empty path means we don’t even try matching so it should catch everything indeed.

Apr 19 '20 15:04 Cimbali

CL breaks page functionality here: https://www.timeanddate.com/worldclock/converter.html (try to add an entry)

The entries in log are nullmenc(0) -javascript:menc(0)

Apr 20 '20 17:04 Rtizer-9

Hey @Cimbali , hope you're safe and doing well.

I just found out a bug in facebook rules. The whole scenario is like this:

If you visit a url of pattern: https://www.facebook.com/photo.php?fbid=xxxxxxxxxxxxxxx&set=a.xxxxxxxxxxxxxxx&type=3 , the fbid parameter gets stripped out because of rules but when I looked at ClearUrls' rules, I saw that there are some exceptions and "photo" is one of them. So maybe when you imported the rules from there, the exceptions were not taken care of. I mean that's the only conclusion one care draw given how CL is playing with those links.

The fbid parameters gets stripped out and after visiting the link you get that "page isn't available" error giving the user an illusion that the post is indeed private.

Please also have a look at other such rules which needs to have exceptions.

May 19 '20 20:05 Rtizer-9

I am reading the wiki, and I didn't see the information I was looking for to allow a generally well-behaved website to do complex internal links, but not external. Do I put the domain in the parameters of what is allowed, or...?

May 01 '23 03:05 Faedelity

In general it’s the target of the link you’re whitelisting. Unless you’re looking at javascript link, in which case there’s no way of knowing in advance what the target is before executing the javascript and its potential tracking actions -- in that case you have to whitelist the origin domain.

So for a website on domain.com:

whitelisting parameters and/or path-embedded URLs on *.domain.com/.* will cover internal links on that website and links from other websites to domain.com, basically saying the layout and structure of the pages on this website is legitimate.
allowing javascript in links on *.domain.com/.* will cover all javascript actions on that domain without the possibility to filter on whether they redirect you to internal or external pages.

I hope that somewhat answers your question? From what I understand of your problem, if you trust that website, whitelist its redirects (i.e. matching *.domain.com/.*) and that won’t cover (non-javascript) external links.

May 01 '23 11:05 Cimbali