wp-rocket icon indicating copy to clipboard operation
wp-rocket copied to clipboard

Umlaut domains and config file names

Open glueckpress opened this issue 9 years ago • 13 comments

An umlaut domain would read like this:

ipl-geräte-vergleich.de

As a URL in the browser, however, it gets decoded like this:

http://xn--ipl-gerte-vergleich-mwb.de/

WP Rocket’s config file needs to be named after the decoded version:

xn--ipl-gerte-vergleich-mwb.de.php

Currently it gets named after the encoded version, though:

ipl-geräte-vergleich.de.php

Hence, no config file found, no caching.

(Relevant ticket: https://secure.helpscout.net/conversation/265298676/24516/?folderId=714999)

glueckpress avatar Oct 18 '16 15:10 glueckpress

for reference: http://php.net/manual/fr/function.idn-to-ascii.php

remyperona avatar Mar 02 '17 01:03 remyperona

Another case: https://secure.helpscout.net/conversation/700534535/84887?folderId=1213664

WordPresseur avatar Nov 06 '18 13:11 WordPresseur

I think this should be fixed in a major version release so that we can take advantage of our elaborate QA process.

Even though the change in minor (in the grand scheme of things), changes related to config files can break the internet.

Also, hello Caspar 😘

arunbasillal avatar Dec 12 '19 21:12 arunbasillal

@arunbasillal We can do that for 3.7. I guess we need to buy a umlaut domain to be able make the QA?

GeekPress avatar May 21 '20 09:05 GeekPress

@GeekPress I think we can test using the hosts file on windows (and something similar on mac).

arunbasillal avatar May 21 '20 16:05 arunbasillal

It would be good to have a live site with this kind of domain name for testing

remyperona avatar Jun 05 '20 20:06 remyperona

Identify the root cause ✅ Cause is in the original comment for the issue

Scope a solution ✅ when generating the filename for the config file in get_rocket_config_file(), we need to use the idn_to_ascii() function to have the correctly encoded filename.

While this is simple, it's a good opportunity to refactor the code handling the config file generation. The current code is old and very complex.

I recommend to create a new class Config in Engine\Cache to handle this. We will replace & deprecate get_rocket_config_file(), and also update rocket_generate_config_file().

Estimate the effort ✅ I think it's an [M] task, because there is quite some refactor to do, and new tests to write for all of this

remyperona avatar Jun 05 '20 21:06 remyperona

Related https://secure.helpscout.net/conversation/1217765247/178605?folderId=2675957

camilamadronero-zz avatar Jul 08 '20 15:07 camilamadronero-zz

(sticks head in the door) Respectfully reporting we still have umlauts in German and, as annoying as they may seem, they are not likely to disappear any time soon. 👋🙂 Cheerio! (retreats)

glueckpress avatar Nov 20 '20 21:11 glueckpress

Related: https://secure.helpscout.net/conversation/1945664238/355184/

juricazuanovic avatar Jul 13 '22 11:07 juricazuanovic

Problem Description

When a website uses an internationalized domain name (IDN) with special characters like umlauts (e.g., ipl-geräte-vergleich.de), WP Rocket fails to create the correct configuration file name, resulting in no caching for these domains.

Example

Domain in browser: http://xn--ipl-gerte-vergleich-mwb.de/ (ASCII-encoded version)
Current config file name: ipl-geräte-vergleich.de.php (wrong - uses Unicode version)
Expected config file name: xn--ipl-gerte-vergleich-mwb.de.php (correct - uses ASCII/Punycode version)

Reproduce the problem

To reproduce the issue:

  1. Set up a WordPress site with an IDN domain containing special characters (e.g., German umlauts: ä, ö, ü)
  2. Install and activate WP Rocket
  3. Check the config file generated in wp-content/wp-rocket-config/
  4. Observe that the filename uses the Unicode version of the domain instead of the ASCII/Punycode version
  5. The site will not be cached because WordPress/Apache serves the site under the ASCII domain, but WP Rocket looks for a config file with the Unicode name

Note: While we can test using the hosts file locally, having a live test domain would be ideal for QA validation.

Identify the root cause

The issue is in the get_rocket_config_file() function located at inc/functions/files.php around line 178:

foreach ( $urls as $url ) {
    $file                = get_rocket_parse_url( untrailingslashit( $url ) );
    $file['path']        = ( ! empty( $file['path'] ) ) ? str_replace( '/', '.', untrailingslashit( $file['path'] ) ) : '';
    $config_files_path[] = WP_ROCKET_CONFIG_PATH . strtolower( $file['host'] ) . $file['path'] . '.php';
}

The $file['host'] is used directly without converting internationalized domain names to their ASCII/Punycode representation using idn_to_ascii().

Scope a solution

To solve the issue, we need to:

  1. Convert the domain to ASCII/Punycode format when generating the config file name
  2. Use PHP's idn_to_ascii() function which is available in PHP 7.2+ (WP Rocket's minimum requirement)
  3. Add the conversion in the get_rocket_config_file() function before building the filename
  4. Handle existing config files - When updating, we may need to migrate old incorrectly-named config files to the new naming scheme, or simply regenerate them (simpler approach)

Solution Approach

Modify get_rocket_config_file() function:

Add IDN to ASCII conversion when building the config file path:

foreach ( $urls as $url ) {
    $file         = get_rocket_parse_url( untrailingslashit( $url ) );
    $file['path'] = ( ! empty( $file['path'] ) ) ? str_replace( '/', '.', untrailingslashit( $file['path'] ) ) : '';
    
    // Convert IDN (internationalized domain names) to ASCII/Punycode format
    $host = $file['host'];
    if ( function_exists( 'idn_to_ascii' ) ) {
        // PHP 7.4+ signature
        $ascii_host = idn_to_ascii( $host, IDNA_DEFAULT, INTL_IDNA_VARIANT_UTS46 );
        if ( false !== $ascii_host ) {
            $host = $ascii_host;
        }
    }
    
    $config_files_path[] = WP_ROCKET_CONFIG_PATH . strtolower( $host ) . $file['path'] . '.php';
}

Note on PHP compatibility:

  • idn_to_ascii() signature changed in PHP 7.4 (added $flags and $variant parameters)
  • For PHP 7.2-7.3: idn_to_ascii( string $domain, int $options = IDNA_DEFAULT, int $variant = INTL_IDNA_VARIANT_2003 )
  • For PHP 7.4+: idn_to_ascii( string $domain, int $flags = IDNA_DEFAULT, int $variant = INTL_IDNA_VARIANT_UTS46 )
  • We should use INTL_IDNA_VARIANT_UTS46 (modern standard) with proper error handling

Why not create a new class as recommended before?

There was a comment about creating a new Config class in Engine\Cache to handle config file generation. However, after checking:

  • We already have ConfigSubscriber at inc/Engine/Cache/Config/ConfigSubscriber.php that handles config-related hooks
  • This is a focused bug fix, not a full refactor
  • The change is minimal (adding 5-10 lines to convert IDN to ASCII)
  • Creating or refactoring a class would significantly increase the effort without providing immediate value

Effort

S

Miraeld avatar Nov 18 '25 05:11 Miraeld

LGTM @wordpressfan Can we have this refactor here in a CD Sprint?

jeawhanlee avatar Nov 19 '25 08:11 jeawhanlee

I agree we need to move this to the next cooldown, because it's touching the most important part in the whole plugin, so it needs more and more testing.

wordpressfan avatar Nov 20 '25 11:11 wordpressfan