web-archives icon indicating copy to clipboard operation
web-archives copied to clipboard

Optional (But Default On) Tracker URL Removal So That Searches Don't as Easily Fail

Open mollyrealized opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

It was a judgment call whether to file this as a bug or a feature request, but I think it's much the latter. When a site has Google Analytics tracking as part of its URL (the ubiquitous "utm_source" and "utm_medium" and "utm_campaign", it will usually fail when it is piped over to archive.today (archive.is, archive.ph, etc.) -- and with some other trackers, as well, I suspect (but don't know for sure).

Describe the solution you'd like

It would be useful if by default Web Archives stripped trackers out of the URL being looked up on archives, with perhaps a setting to disable that entirely, or disable it per lookup.

Describe alternatives you've considered

I presently hand-remove the trackers and rerun the request. It works; it's just an annoyance. :) I've also brought it up with archive.today, but it seems to have fallen into their bit bucket of "maybe someday".

mollyrealized avatar Sep 24 '23 18:09 mollyrealized

We could use an existing filter list, but I'd like to avoid writing our own filter list parser, and from a quick search I couldn't find a compact js package for parsing static filter lists that has a permissive license.

https://github.com/DandelionSprout/adfilt/blob/master/ClearURLs%20for%20uBo/clear_urls_uboified.txt

dessant avatar Feb 06 '24 09:02 dessant

I definitely acknowledge an existing filter list being useful, but even as a first step, removing Google tracking would go a long way. Perhaps something like

[?&]utm_[^&]+

mollyrealized avatar Feb 06 '24 15:02 mollyrealized