tsurlfilter icon indicating copy to clipboard operation
tsurlfilter copied to clipboard

ad filtering syntax: `removeparam` modifier doesn't help to clean junk query params that are preceded by a hash (`#`)

Open kirisakow opened this issue 1 year ago • 6 comments

Prerequisites

  • [X] I checked the documentation and found no answer;
  • [X] I checked to make sure that this issue has not already been filed;
  • [X] This is not an ad/bug report.

Problem description

  • Be a URL with a few rather classic junk query params (at_medium, at_campaign, etc) — but preceded by a #:
https://www.france.tv/films/5606040-une-affaire-privee.html#at_medium=5&at_campaign_group=2&at_campaign=integrale&at_offre=1&at_send_date=20240106&at_recipient_id=459386-1664366309-2d5f2440
  • Be a custom filter rule (which works all right as long as the query params are preceded by a ? or an &):
||*$removeparam=/^(at|ul|utm)_/

Expected behavior

With the aforementioned rule being set, the URL should be rendered as

https://www.france.tv/films/5606040-une-affaire-privee.html

Actual behavior

The URL remains unchanged. On the other hand, the filter works all right as long as the query params are preceded by a ? or an &.

Proposed solution

Not sure, but probably the ^ set of separator characters (separator marks) should also include the hash (#).

Excerpt from the KB:

Special characters

(...)

  • ^ — a separator character mark. Separator character is any character, but a letter, a digit, or one of the following: _ - . %. In this example separator characters are shown in bold: http://example.com/?t=1&t2=t3. The end of the address is also accepted as separator.

Additional information

No response

kirisakow avatar Jan 07 '24 22:01 kirisakow

Parameters after # are not send. image

Where did you get this link?

Alex-302 avatar Jan 11 '24 12:01 Alex-302

I think that the reporter minds that a website can send the hash using location.hash and XHR/fetch.

piquark6046 avatar Jan 11 '24 12:01 piquark6046

Where did you get this link?

From their newsletter.

For instance, see its last issue:

https://t.nl.francetv.fr/r/?id=hc9785a1,6c509b4c,5fd1bede&p1=%40UYYMNRAcUXhUaReRIlHftNHzxSzNS0B3t5dfPCFeDjM%3D&p2=20240110&p3=459386-1664366309-2d5f2440

While redirecting to the target URL (the issue layout), you'll see the aforementioned hash-preceded params somehow appear appended at the end of that URL.

You'll also see them appended to each content URL featured in the issue.

kirisakow avatar Jan 11 '24 15:01 kirisakow

I think that the reporter minds that a website can send the hash using location.hash and XHR/fetch.

Thank you for the concern. There's another reason why I want my URLs to be clean of any garbage query params: so that I don't have to clean them manually when I save or share them.

kirisakow avatar Jan 11 '24 15:01 kirisakow

For instance, see its last issue:

I need an address of the page, which adds parameters to links. $removeparam can't remove it, because #... added by JS or just links in html contain that.

Alex-302 avatar Jan 12 '24 16:01 Alex-302

$removeparam can't remove it, because #... added by JS or just links in html contain that.

Is there any other modifier, other than removeparam to remove hash params? If not, shouldn't it be created? Is it possible, at all?

kirisakow avatar Jan 12 '24 17:01 kirisakow