CleanLinks
CleanLinks copied to clipboard
Remove garbage fields matching these... (from Pure-URL addon)
Pure-URL addon lets you remove garbage fields simply by specifying the strings you would like to remove from the link, separated by commas. By default, it removes the following garbage fields:
utm_source, utm_medium, utm_term, utm_content, utm_campaign, utm_reader, utm_place, ga_source, ga_medium, ga_term, ga_content, ga_campaign, ga_place, yclid, _openstat, , fb_action_ids, fb_action_types, fb_ref, fb_source, action_object_map, action_type_map, action_ref_map, , , , ,
How can I configure Clean Links to block all of these as well? Do I have to use regex rules and if so, has anyone been able to convert the garbage fields from Pure-URL to rules for Clean Links?
Thanks.
P.S. I noticed Clean Links whitelisted a lot of domains, including www.facebook,com. I was wondering if it is likely these sites will break if I removed it from the whitelist (i.e. have them cleaned as well) because apparently sites likes facebook are massive offenders of "dirty" links. In what manner do the sites break if they do?
I do not understand regex at all.
I came up with this string, testing with http://regexr.com/
(?:ref|aff)\w*|ga_\w+|fb_\w+|\w*utm_\w+|(?:merchant|programme|media|)yclid|_openstat|action_object_map|action_type_map|action_ref_map|ID
Which is likely to be completely wrong ! :)
although it did appear to catch the extra PL tags. I wish it was easier to do.
Thanks, I will give it a try for a while and see how it goes--hope others can do the same and report back or improve on it if necessary :)
I think this should working:
(?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID|ga_\w*|fb_\w*|ylcid\w*|action_\w*|_openstat\w*
or
(?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID|(?:ga_|fb_|ylcid|action_|_openstat)\w*
| regex | short explanation |
|---|---|
| **` | `** |
| (?:xxx) | Group |
| xxx | match > remove |
| \w* | match 0 or more of the preceding token > remove |
| \w+ | match 1 or more of the preceding token > remove |
| \w{x} | match x of the preceding token > remove |
P.S. @diegocr Please give us bigger text fields in the settings!
edit Some additional tracking keywords:
Amazon :
ascsubtag|~~qid|~~bbn|tag|pf_|SubscriptionId|linkCode|camp|creative|creativeASIN
I noticed that yahoo is using qid for Yahoo!Answers as identification for the various questions. But also using their tracking ids gprid|pvid.
Does anyone know what Amazon relates to qid?
Answer from stackoverflow
qid=1387193124 is a unix timestamp that the URL was generated, in this case October 3rd, 2015 at 12:40:07 GMT
I think in this case qid is not relevant.
YouTube :
feature
Hi guys. Could you please point out why CleanLinks does not remove "?ws_ab_test..." and what can be done to fix this? Thank you
An example link would be useful E.g:
https://de.aliexpress.com/item/TK1327/32409702785.html?ws_ab_test=searchweb201556_10,searchweb201602_5_10057_10056_10055_10049_10017_405_404_10059_10058_10040_10060,searchweb201603_7&btsid=190ca3d2-af2b-411f-abd2-2d035219b767
Try: ws_\w*
To clear the link complete you need to add btsid\w* as well
Don't forget the delimiter |
ws_\w*|btsid\w*
edit I would appreciate an user base trash/tracking Link Database. I think this thread is not a bad start, maybe @rieje can edit the opening post with the new links.
We do need a trash/tracking database +1
@geokis
Thanks but I'm a total noob when it comes to regular expressions and stuff like that...
My "remove from links" field reads like this (it's addon default):
(?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID
I added ws_\w_|btsid\w_ to the end, like this
(?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID|ws_\w*|btsid\w*
And apparently it did nothing
Did you check the "Use HTTP Observer" option in the settings panel ?
If I UNcheck that - then the example link above is not cleaned
@GitCurious I tried with both checked and unchecked HTTP Observer, and still no luck...
@codeshark1:
I added ws_\w|btsid\w to the end, like this
(?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID|ws_\w*|btsid\w*...
This should work.
Alternative you can try this:
(?:ref|aff|ws_|btsid)\w*|utm_\w+|(?:merchant|programme|media)ID
but it is pretty the same.
Both works for me. In settings all checkboxes are active, except:
ingnore no-https links
Maybe you have under [Skip Links matching with:] an element that contains item ?
Remove it and try again.
edit
From: #140
Yahoo Search Image:
_yl\w*\w*|gprid\w*|pvid\w*
or
(?:_yl\w*|\w*id)\w*
\w*id would contain all types like xxxID
(xxx = any number of characters)
This caused to often problems with other sites. Stick with it:
(?:_yl\w*|gprid|pvid)\w*
edit2
Google Search:
(?:gs_l|gclid|ei)\w*Google SearchImage:
vedebay:
(?:_qi|clk_rvr_id|_trk\w*)\w*afillinet:
subid
@geokis
Can you help me understand why the "groups" are not written like this:
(?:ref|aff|merchant|programme|media|ga_|fb_|ylcid|action_|_openstat) ......
In just one single group...rather than separate groups ?
Hi @GitCurious
(?:merchant|programme|media)ID
This expression will only matches this kind of structure:
- merchantID
- programmeID
- mediaID
!Caution CL are case-sensitive so id ≠ ID
If you use this kind of expression:
(?:merchant|programme|media)\w*
It will matches:
- merchantxxx
- programmexxx
- mediaxxx
(xxx = any type characters/numbers)
So the difference is that the ID expression is more specific and less dangerous to interfere with similar expression types in the URL. Both would working.
E.g.: My suggestion in the post above:
gprid|pvid would also works as (?:gpr|pv)id and it is more specific than \w*id
Thankyou geokis
I must be as dumb as a bag of rocks.
I did not see there was no delimiter between the closing brace and the "ID" so that explains it exactly....
so I have learned a little bit more about regex formatting :)
@geokis Have you found a rule that you've stuck with that is more restrictive than the default settings and is a good balance between "cleanliness" and usability? I actually know some regex to make my own rules but the issue is I have almost zero knowledge of what kind of attributes can be blocked with no issues and what kind are essential which when removed will break functionality of the site (like shopping cart for amazon.com, for example).
If you have a more restrictive rule you can share then it would simply be a matter of using it and removing problematic ones if an issue is encountered.