CleanLinks Remove garbage fields matching these... (from Pure-URL addon)

Pure-URL addon lets you remove garbage fields simply by specifying the strings you would like to remove from the link, separated by commas. By default, it removes the following garbage fields:

utm_source, utm_medium, utm_term, utm_content, utm_campaign, utm_reader, utm_place, ga_source, ga_medium, ga_term, ga_content, ga_campaign, ga_place, yclid, _openstat, , fb_action_ids, fb_action_types, fb_ref, fb_source, action_object_map, action_type_map, action_ref_map, , , , ,

How can I configure Clean Links to block all of these as well? Do I have to use regex rules and if so, has anyone been able to convert the garbage fields from Pure-URL to rules for Clean Links?

Thanks.

P.S. I noticed Clean Links whitelisted a lot of domains, including www.facebook,com. I was wondering if it is likely these sites will break if I removed it from the whitelist (i.e. have them cleaned as well) because apparently sites likes facebook are massive offenders of "dirty" links. In what manner do the sites break if they do?

Jun 26 '16 02:06 rieje

I do not understand regex at all.

I came up with this string, testing with http://regexr.com/

Which is likely to be completely wrong ! :)

although it did appear to catch the extra PL tags. I wish it was easier to do.

Jun 30 '16 11:06 GitCurious

Thanks, I will give it a try for a while and see how it goes--hope others can do the same and report back or improve on it if necessary :)

Jul 06 '16 06:07 rieje

I think this should working:

(?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID|ga_\w*|fb_\w*|ylcid\w*|action_\w*|_openstat\w*

or

(?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID|(?:ga_|fb_|ylcid|action_|_openstat)\w*

regex	short explanation
**`	`**
(?:xxx)	Group
xxx	match > remove
\w*	match 0 or more of the preceding token > remove
\w+	match 1 or more of the preceding token > remove
\w{x}	match x of the preceding token > remove

P.S. @diegocr Please give us bigger text fields in the settings!

edit Some additional tracking keywords:

Amazon : ascsubtag|~~qid|~~bbn|tag|pf_|SubscriptionId|linkCode|camp|creative|creativeASIN

I noticed that yahoo is using qid for Yahoo!Answers as identification for the various questions. But also using their tracking ids gprid|pvid. Does anyone know what Amazon relates to qid?

Answer from stackoverflow

qid=1387193124 is a unix timestamp that the URL was generated, in this case October 3rd, 2015 at 12:40:07 GMT

I think in this case qid is not relevant.

YouTube : feature

Jul 28 '16 22:07 geokis

Hi guys. Could you please point out why CleanLinks does not remove "?ws_ab_test..." and what can be done to fix this? Thank you

Aug 09 '16 14:08 codeshark1

An example link would be useful E.g:

https://de.aliexpress.com/item/TK1327/32409702785.html?ws_ab_test=searchweb201556_10,searchweb201602_5_10057_10056_10055_10049_10017_405_404_10059_10058_10040_10060,searchweb201603_7&btsid=190ca3d2-af2b-411f-abd2-2d035219b767

Try: ws_\w* To clear the link complete you need to add btsid\w* as well

Don't forget the delimiter | ws_\w*|btsid\w*

edit I would appreciate an user base trash/tracking Link Database. I think this thread is not a bad start, maybe @rieje can edit the opening post with the new links.

Aug 10 '16 14:08 geokis

We do need a trash/tracking database +1

Aug 10 '16 15:08 GitCurious

Aug 14 '16 23:08 codeshark1

Did you check the "Use HTTP Observer" option in the settings panel ?

If I UNcheck that - then the example link above is not cleaned

Aug 15 '16 06:08 GitCurious

@GitCurious I tried with both checked and unchecked HTTP Observer, and still no luck...

Aug 15 '16 12:08 codeshark1

@codeshark1:

I added ws_\w|btsid\w to the end, like this (?:ref|aff)\w*|utm_\w+|(?:merchant|programme|media)ID|ws_\w*|btsid\w* ...

This should work.

Alternative you can try this:

(?:ref|aff|ws_|btsid)\w*|utm_\w+|(?:merchant|programme|media)ID

but it is pretty the same.

Both works for me. In settings all checkboxes are active, except:

ingnore no-https links

Maybe you have under [Skip Links matching with:] an element that contains item ? Remove it and try again.

edit

From: #140

Yahoo Search Image: _yl\w*\w*|gprid\w*|pvid\w*

or

(?:_yl\w*|\w*id)\w*

\w*id would contain all types like xxxID (xxx = any number of characters) This caused to often problems with other sites. Stick with it:

(?:_yl\w*|gprid|pvid)\w*

edit2

Google Search: (?:gs_l|gclid|ei)\w*

Google SearchImage: ved

ebay: (?:_qi|clk_rvr_id|_trk\w*)\w*

afillinet: subid

Aug 15 '16 13:08 geokis

@geokis

Can you help me understand why the "groups" are not written like this:

In just one single group...rather than separate groups ?

Aug 16 '16 06:08 GitCurious

Hi @GitCurious

(?:merchant|programme|media)ID

This expression will only matches this kind of structure:

merchantID
programmeID
mediaID

!Caution CL are case-sensitive so id ≠ ID

If you use this kind of expression:

(?:merchant|programme|media)\w*

It will matches:

merchantxxx
programmexxx
mediaxxx

(xxx = any type characters/numbers)

So the difference is that the ID expression is more specific and less dangerous to interfere with similar expression types in the URL. Both would working.

E.g.: My suggestion in the post above: gprid|pvid would also works as (?:gpr|pv)id and it is more specific than \w*id

Aug 16 '16 19:08 geokis

Thankyou geokis

I must be as dumb as a bag of rocks.

I did not see there was no delimiter between the closing brace and the "ID" so that explains it exactly....

so I have learned a little bit more about regex formatting :)

Aug 16 '16 20:08 GitCurious

@geokis Have you found a rule that you've stuck with that is more restrictive than the default settings and is a good balance between "cleanliness" and usability? I actually know some regex to make my own rules but the issue is I have almost zero knowledge of what kind of attributes can be blocked with no issues and what kind are essential which when removed will break functionality of the site (like shopping cart for amazon.com, for example).

If you have a more restrictive rule you can share then it would simply be a matter of using it and removing problematic ones if an issue is encountered.

Jan 31 '17 01:01 rieje

CleanLinks CleanLinks copied to clipboard

Remove garbage fields matching these... (from Pure-URL addon)

CleanLinks
CleanLinks copied to clipboard