Add an option to decode URL in `$urltransform`
Issue Details
It's related to - https://github.com/AdguardTeam/CoreLibs/issues/1557#issuecomment-2351459285
Currently if we want to redirect to another origin from the link which contains destination page as one of the parameters and this part of URL is encoded, then it's necessary to decode some characters. For example, this link:
https://track.effiliation.com/servlet/effi.redir?id_compteur=12305754&effi_id=1646343493&url=https%3A%2F%2Ffr.shopping.rakuten.com%2Foffer%2Fshop%2F11769144290%2Fdyson-v8-absolute-aspirateur.html%3FsellerLogin%3DBoulanger
The destination page is in url parameter, but it's encoded:
https%3A%2F%2Ffr.shopping.rakuten.com%2Foffer%2Fshop%2F11769144290%2Fdyson-v8-absolute-aspirateur.html%3FsellerLogin%3DBoulanger
so it's needed to decode some characters:
%3A
%2F
%3D
These rules:
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/
seems to work fine, but if we would have a decode URL option, then we could use just something like:
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/decodeURL
Proposed solution
Add an option to decode URL, maybe as an additional modifier. Or if it's already possible or can be done somehow easily in one rule, then it would be nice to add it to documentation.
Alternative solution
No response
I think it would be best if this was a separate modifier:
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/,decodeurl
Probably enhanced as part of the urltransform modifier, which decodes the redirect target before redirecting it.
Ability to decode base64 also would be a nice thing.
Or even better, an option to decode base64 few times. The real case is here - https://github.com/AdguardTeam/AdguardFilters/issues/190685 This website contains links like this one:
https://b.myfirstdollar.org/#!WVVoU01HTklUVFpNZVRrelpETmpkV0pYVm10aFYwWnRZVmhLYkV4dFRuWmlVemx0WVZkNGJFd3lORFZPYW1ONldWUk9NR1ZxUm5aYVYyaHhZVk01VlZSV1NrVlZhM2hhVkVVMVJWUkZVbE5STVVwSFZrUkZNRkV3VVhoWFJFVjFZMjFHZVV3eVduQmlSMVU5
This part:
WVVoU01HTklUVFpNZVRrelpETmpkV0pYVm10aFYwWnRZVmhLYkV4dFRuWmlVemx0WVZkNGJFd3lORFZPYW1ONldWUk9NR1ZxUm5aYVYyaHhZVk01VlZSV1NrVlZhM2hhVkVVMVJWUkZVbE5STVVwSFZrUkZNRkV3VVhoWFJFVjFZMjFHZVV3eVduQmlSMVU5
is a link encoded in base64 few times:
YUhSMGNITTZMeTkzZDNjdWJXVmthV0ZtYVhKbExtTnZiUzltYVd4bEwyNDVOamN6WVROMGVqRnZaV2hxYVM5VVRWSkVVa3haVEU1RVRFUlNRMUpHVkRFMFEwUXhXREV1Y21GeUwyWnBiR1U9 // 1
aHR0cHM6Ly93d3cubWVkaWFmaXJlLmNvbS9maWxlL245NjczYTN0ejFvZWhqaS9UTVJEUkxZTE5ETERSQ1JGVDE0Q0QxWDEucmFyL2ZpbGU= // 2
hxxps://www[.]redacted[.]com/file/n9673a3tz1oehji/TMRDRLYLNDLDRCRFT14CD1X1.rar/file // 3, I have redacted it intentionally
/^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%3A/:/ /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%2F/// /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%3F/?/ /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/%3D/=/ /^https?://(?:[a-z0-9-]+.)?(?:track.effiliation.com/servlet/effi.redir|dealabs.digidip.net/visit?url=)/$urltransform=/^https?://(?:[a-z0-9-]+.)?(?:effiliation.com|dealabs.digidip.net).url=([^&])/$1/ ```
Hello @AdamWr
The 4 first rules work well, to "decode" the %3A %2F %3F %3D But then, it does not match the last rule
Let's say this url: https://dealabs.digidip.net/visit?url=https%3A%2F%2Fwww.carrefour.fr%2Fp%2Fvalise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https%3A%2F%2Fwww.dealabs.com%2F&ref=ppr-fr-1676838576
When it goes through your rules it gives https://dealabs.digidip.net/visit?url=https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https://www.dealabs.com/&ref=ppr-fr-1676838576 But not https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859
It matches the rule /.*/$permissions=identity-credentials-get=(),document (Filtre AdGuard Anti nuisances) and that's all.
Any idea why ? Your regex seems correct to me.
I see that rules from my first post:
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/
work, but instead of redirecting to:
https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859
it redirects to:
https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https://www.dealabs.com/&ref=ppr-fr-1676838576
The problem seems to be with the last rule. This:
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*).*/\$1/
seems to works correctly.
@AdamWr
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/url=([^&]*)/\$1/
Here are the rules that are correct in a regex point of view, I believe.
The url https://dealabs.digidip.net/visit?url=https%3A%2F%2Fwww.carrefour.fr%2Fp%2Fvalise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https%3A%2F%2Fwww.dealabs.com%2F&ref=ppr-fr-1676838576
matches the 4 rules but then it gives https://dealabs.digidip.net/visit?https://www.carrefour.fr/p/valise-caracas-55cm-4-roues-noir-delsey-3219110489859&ppref=https://www.dealabs.com/search/bons-plans?merchant-id=42&ref=ppr-fr-1677795608 (It has correctly converted the %3A%2F but then it just removed the "url=").
I tried with your rule and then I tried with other rules but I didn't come up with anything better.
I start to think that $urltransform cannot delete the base part, the address on which it acts.
||example.com^$urltransform=/firstpath/secondpath/
It may act and transform things after example.com, but not example.com itself. I hope I'm wrong. 🙁
That's how it looks like on my end with AdGuard for Windows latest nightly build:
Screenshot
I start to think that $urltransform cannot delete the base part, the address on which it acts.
||example.com^$urltransform=/firstpath/secondpath/
It may act and transform things after example.com, but not example.com itself. I hope I'm wrong.
To change origin it's necessary to use ^http at the beginning of the rule - https://adguard.com/kb/general/ad-filtering/create-own-filters/#urltransform-modifier:~:text=this%3A%20%5C%2C.-,Changing%20the%20origin,-COMPATIBILITY
So for example to redirect from https://example.org/test to https://duckduckgo.com/test a rule like this:
||example.org^$urltransform=/^https?:\/\/example\.org(\/test)/https:\/\/duckduckgo\.com\$1/
can be used.
Ok thank you. So I've tried under windows (VM). I've applied(copy-paste) the exact same rules
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3A/:/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%2F/\//
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3F/?/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/%3D/=/
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*).*/\$1/
Under MacOS, it does not match the last rule:
But under Windows, it does:
How is that possible ? Should I open an issue ?
@AdamWr @sfionov @ameshkov
Proposed changes to $urltransform:
; $urltransform=TRANSFORMS
TRANSFORMS = TRANSFORM | TRANSFORM "|" TRANSFORMS
TRANSFORM = SUBSTITUTE | DECODE
SUBSTITUTE = ... ; `/<regex>/<substitution>/` as it is defined right now.
DECODE = b64 | pct ; `b64` decodes Base64 (incl. URL-safe), `pct` percent-decodes.
Each transformation is applied to the result of the previous transformation.
So, for example, the original case becomes:
/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:track\.effiliation\.com\/servlet\/effi\.redir|dealabs\.digidip\.net\/visit\?url=)/$urltransform=/^https?:\/\/(?:[a-z0-9-]+\.)*?(?:effiliation\.com|dealabs\.digidip\.net).*url=([^&]*)/\$1/|pct
Another example:
||example.org^*redir=*$urltransform=/^https?:\/\/example\.org?redir=(.*)\$/\$1/|b64|b64
All in favor?
Included in CoreLibs release v1.20.53