CoreLibs icon indicating copy to clipboard operation
CoreLibs copied to clipboard

Add an option to decode URL in `$urltransform`

Open AdamWr opened this issue 1 year ago • 7 comments

Issue Details

It's related to - https://github.com/AdguardTeam/CoreLibs/issues/1557#issuecomment-2351459285

Currently if we want to redirect to another origin from the link which contains destination page as one of the parameters and this part of URL is encoded, then it's necessary to decode some characters. For example, this link:

https://track.effiliation.com/servlet/effi.redir?id_compteur=12305754&effi_id=1646343493&url=https%3A%2F%2Ffr.shopping.rakuten.com%2Foffer%2Fshop%2F11769144290%2Fdyson-v8-absolute-aspirateur.html%3FsellerLogin%3DBoulanger

The destination page is in url parameter, but it's encoded:

https%3A%2F%2Ffr.shopping.rakuten.com%2Foffer%2Fshop%2F11769144290%2Fdyson-v8-absolute-aspirateur.html%3FsellerLogin%3DBoulanger

so it's needed to decode some characters:

%3A
%2F
%3D

These rules:

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/

seems to work fine, but if we would have a decode URL option, then we could use just something like:

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/decodeURL

Proposed solution

Add an option to decode URL, maybe as an additional modifier. Or if it's already possible or can be done somehow easily in one rule, then it would be nice to add it to documentation.

Alternative solution

No response

AdamWr avatar Sep 15 '24 11:09 AdamWr

I think it would be best if this was a separate modifier:

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/,decodeurl

Probably enhanced as part of the urltransform modifier, which decodes the redirect target before redirecting it.

cxplay avatar Sep 15 '24 15:09 cxplay

Ability to decode base64 also would be a nice thing.


Or even better, an option to decode base64 few times. The real case is here - https://github.com/AdguardTeam/AdguardFilters/issues/190685 This website contains links like this one:

https://b.myfirstdollar.org/#!WVVoU01HTklUVFpNZVRrelpETmpkV0pYVm10aFYwWnRZVmhLYkV4dFRuWmlVemx0WVZkNGJFd3lORFZPYW1ONldWUk9NR1ZxUm5aYVYyaHhZVk01VlZSV1NrVlZhM2hhVkVVMVJWUkZVbE5STVVwSFZrUkZNRkV3VVhoWFJFVjFZMjFHZVV3eVduQmlSMVU5

This part:

WVVoU01HTklUVFpNZVRrelpETmpkV0pYVm10aFYwWnRZVmhLYkV4dFRuWmlVemx0WVZkNGJFd3lORFZPYW1ONldWUk9NR1ZxUm5aYVYyaHhZVk01VlZSV1NrVlZhM2hhVkVVMVJWUkZVbE5STVVwSFZrUkZNRkV3VVhoWFJFVjFZMjFHZVV3eVduQmlSMVU5

is a link encoded in base64 few times:

YUhSMGNITTZMeTkzZDNjdWJXVmthV0ZtYVhKbExtTnZiUzltYVd4bEwyNDVOamN6WVROMGVqRnZaV2hxYVM5VVRWSkVVa3haVEU1RVRFUlNRMUpHVkRFMFEwUXhXREV1Y21GeUwyWnBiR1U9 // 1
aHR0cHM6Ly93d3cubWVkaWFmaXJlLmNvbS9maWxlL245NjczYTN0ejFvZWhqaS9UTVJEUkxZTE5ETERSQ1JGVDE0Q0QxWDEucmFyL2ZpbGU= // 2
hxxps://www[.]redacted[.]com/file/n9673a3tz1oehji/TMRDRLYLNDLDRCRFT14CD1X1.rar/file // 3, I have redacted it intentionally

AdamWr avatar Oct 10 '24 15:10 AdamWr

/^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%3A/:/ /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%2F/// /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%3F/?/ /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%3D/=/ /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/^https?://(?:[a-z0-9-]+.)?(?:effiliation.com|dealabs.digidip.net).url=([^&])/$1/ ```

Hello @AdamWr

The 4 first rules work well, to "decode" the %3A %2F %3F %3D But then, it does not match the last rule

Let's say this url: https://dealabs.digidip.net/visit?url=https%3A%2F%2Fwww.carrefour.fr%2Fp%2Fvalise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https%3A%2F%2Fwww.dealabs.com%2F&ref=ppr-fr-1676838576

When it goes through your rules it gives https://dealabs.digidip.net/visit?url=https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https://www.dealabs.com/&ref=ppr-fr-1676838576 But not https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859

It matches the rule /.*/$permissions=identity-credentials-get=(),document (Filtre AdGuard Anti nuisances) and that's all.

Any idea why ? Your regex seems correct to me.

imTHAI avatar Nov 23 '24 13:11 imTHAI

I see that rules from my first post:

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/

work, but instead of redirecting to:

https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859

it redirects to:

https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https://www.dealabs.com/&ref=ppr-fr-1676838576

The problem seems to be with the last rule. This:

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*).*/\$1/

seems to works correctly.

AdamWr avatar Nov 23 '24 19:11 AdamWr

@AdamWr

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/url=([^&]*)/\$1/

Here are the rules that are correct in a regex point of view, I believe.

The url https://dealabs.digidip.net/visit?url=https%3A%2F%2Fwww.carrefour.fr%2Fp%2Fvalise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https%3A%2F%2Fwww.dealabs.com%2F&ref=ppr-fr-1676838576

matches the 4 rules but then it gives https://dealabs.digidip.net/visit?https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https://www.dealabs.com/search/bons-plans?merchant-id=42&ref=ppr-fr-1677795608 (It has correctly converted the %3A%2F but then it just removed the "url=").

I tried with your rule and then I tried with other rules but I didn't come up with anything better.

I start to think that $urltransform cannot delete the base part, the address on which it acts.

||example.com^$urltransform=/firstpath/secondpath/

It may act and transform things after example.com, but not example.com itself. I hope I'm wrong. 🙁

imTHAI avatar Nov 24 '24 17:11 imTHAI

That's how it looks like on my end with AdGuard for Windows latest nightly build:

Screenshot

image


I start to think that $urltransform cannot delete the base part, the address on which it acts.

||example.com^$urltransform=/firstpath/secondpath/

It may act and transform things after example.com, but not example.com itself. I hope I'm wrong.

To change origin it's necessary to use ^http at the beginning of the rule - https://adguard.com/kb/general/ad-filtering/create-own-filters/#urltransform-modifier:~:text=this%3A%20%5C%2C.-,Changing%20the%20origin,-COMPATIBILITY

So for example to redirect from https://example.org/test to https://duckduckgo.com/test a rule like this:

||example.org^$urltransform=/^https?:\/\/example\.org(\/test)/https:\/\/duckduckgo\.com\$1/

can be used.

AdamWr avatar Nov 25 '24 10:11 AdamWr

Ok thank you. So I've tried under windows (VM). I've applied(copy-paste) the exact same rules

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*).*/\$1/

Under MacOS, it does not match the last rule: Capture d’écran 2024-11-25 à 15 55 56 Capture d’écran 2024-11-25 à 16 04 30

But under Windows, it does: Capture d’écran 2024-11-25 à 15 56 53 Capture d’écran 2024-11-25 à 16 06 14

How is that possible ? Should I open an issue ?

imTHAI avatar Nov 25 '24 15:11 imTHAI

@AdamWr @sfionov @ameshkov

Proposed changes to $urltransform:

; $urltransform=TRANSFORMS

TRANSFORMS = TRANSFORM | TRANSFORM "|" TRANSFORMS
TRANSFORM = SUBSTITUTE | DECODE
SUBSTITUTE = ... ; `/<regex>/<substitution>/` as it is defined right now.
DECODE = b64 | pct ; `b64` decodes Base64 (incl. URL-safe), `pct` percent-decodes.

Each transformation is applied to the result of the previous transformation.

So, for example, the original case becomes:

/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/|pct

Another example:

||example.org^*redir=*$urltransform=/^https?:\/\/example\.org?redir=(.*)\$/\$1/|b64|b64

All in favor?

ngorskikh avatar Oct 13 '25 14:10 ngorskikh

Included in CoreLibs release v1.20.53

adguard-bot avatar Nov 25 '25 10:11 adguard-bot