Umbraco.Cloud.Issues icon indicating copy to clipboard operation
Umbraco.Cloud.Issues copied to clipboard

Special characters in URL causes custom redirect to fail on Umbraco Cloud

Open ZupaDev opened this issue 1 year ago • 10 comments

Issue description

Our website is built with Umbraco 10.5.0 and configured to allow Unicode characters in URLs. Here is a snippet containing the RequestHandler section in appsettings.json:

...
  "Umbraco": {
    "CMS": {
      "RequestHandler": {
        "ConvertUrlsToAscii": "false",
        "EnableDefaultCharReplacements": false
      }
    }
  }
...

This setting tells Umbraco to remain all URLs in UTF-8 and not convert them to ASCII. It works right and any native non Latin letters appear as they should (not URL-encoded) in URLs. The issue happens only when such URLs are used in redirects on Umbraco Cloud. URLs become encoded after redirect and request results in not found page. Umbraco special property type alias umbracoRedirect is used for routing, but it is also reproduceable with custom redirects.

ZupaDev avatar Jun 21 '23 11:06 ZupaDev

I have tested it further and it seems to be a general Azure issue.

It's not something I tried but you give something similar - https://stackoverflow.com/questions/27817266/how-to-correctly-use-urlencode-and-decode - a go for now and see if that solves it for you :)

RyuLindow avatar Jun 21 '23 11:06 RyuLindow

@RyuLindow thank you for suggested solution. I tried it for custom redirect and it works locally and on .NET Fiddle, but unfortunately, it has the same problem on Umbraco Cloud - URL becomes encoded again after redirect. Besides, it still would be an issue when using of Umbraco special property umbracoRedirect where redirect is controlled by Umbraco.

ZupaDev avatar Jun 21 '23 14:06 ZupaDev

Thanks for trying, @ZupaDev

Much appreciated :)

We'll need @sajumb to take it further from here.

RyuLindow avatar Jun 21 '23 15:06 RyuLindow

I have created a product backlog item for us to look at. I am not certain that it is specially related to Umbraco Cloud or if that is something that we want to support. I will update this thread when we have new insight.

sajumb avatar Jun 23 '23 12:06 sajumb

Hi,

is there any news about this issue? Do you know some rough estimation when it can be solved?

ZupaDev avatar Aug 08 '23 11:08 ZupaDev

Hi, Thank you for your patience as we begin delving into this matter. Our team has taken note of your report and is actively considering avenues for investigation. To be honest, we have not got a whole lot of reports as most users naturally don't run into an issue with non-Latin characters for redirects or try to avoid them. To avoid the issue until it's resolved, consider manually URL-encoding non-Latin characters in URLs used for redirects.

Once we have more concrete findings or potential solutions, we will promptly communicate those to you.

sajumb avatar Aug 14 '23 11:08 sajumb

Hi, We are starting to look into this in more detail. To better assist you, could you please provide some examples of the URLs that haven't worked as expected after the redirect? This will help me understand the specific encoding issues you're facing and propose a more targeted solution. Please post these here or in a dm. Thanks!

sajumb avatar Oct 09 '23 06:10 sajumb

We will be closing this task within the next few days unless we get more details and a how-to-reproduce. Thank you for your understanding.

sajumb avatar Oct 31 '23 19:10 sajumb

Hi Søren, sorry for a delay with reply, I've missed notification about your comments.

We are using native Danish characters in URLs without converting them to ASCII on our web site. For example, /nyheder/optimistisk-syn-på-inflation-og-vækst-gav-positiv-start-på-2023. Such URL works well while opening it in a browser.

However, if a redirect is applied to a page with such URL either by using Umbraco special property type alias umbracoRedirect or by Redirect method in Controller, all the special characters in URL become replaced with encodings like and result URL is like this /nyheder/optimistisk-syn-p%E5-inflation-og-v%E6kst-gav-positiv-start-p%E5-2023 and results in 404 page.

ZupaDev avatar Nov 13 '23 07:11 ZupaDev

Hi @ZupaDev,

Thanks for your patience as we've been addressing the issue with non-Latin characters in URLs during redirects on Umbraco Cloud. After our investigation, we've identified a strategy that should help resolve this challenge.

The core of the issue lies in the way Azure App Services, which Umbraco Cloud relies on, handles URL encoding. To counter this, we suggest implementing a URL rewriting strategy using ASP.NET Core's Microsoft.AspNetCore.Rewrite.IRule and Microsoft.AspNetCore.Builder.UseRewriter. Here's how you can conceptually approach this:

  1. Define Custom Rewrite Rules: Using Microsoft.AspNetCore.Rewrite.IRule, you can define custom rules for how URLs should be rewritten. This allows for specific handling of non-Latin characters, ensuring they retain their original form after redirects.

  2. Implement the Rewrite Logic: With Microsoft.AspNetCore.Builder.UseRewriter, these rules can be applied to incoming requests. This middleware will intercept requests and apply your custom rewrite rules, effectively managing URLs with special characters.

  3. Testing and Adjustment: It's crucial to test these rewrite rules in a controlled environment. This step ensures that the rules work as expected and do not unintentionally affect other URL patterns.

Here is an example as exposed in the appsettings.json file, illustrating how these rewrite rules can be defined:

"Umbraco": {
  "CMS": {
    "UserDefinedCharCollection": [
      {"Char": "æ", "Replacement": "ae"},
      {"Char": "ø", "Replacement": "oe"},
      {"Char": "å", "Replacement": "aa"}
      // Additional mappings can be added as needed
    ]
  }
}

In this example, specific non-Latin characters are mapped to their desired ASCII representations. These mappings are then used in the rewrite rules to ensure URLs are transformed appropriately.

sajumb avatar Jan 18 '24 13:01 sajumb

Hello everyone - I'm closing this issue down, as a solution has been provided and no activity has been made since.

If this is still an issue, feel free to reopen the issue

Kind regards - Mikkel

mikkelhm avatar Jul 24 '24 07:07 mikkelhm