kirby icon indicating copy to clipboard operation
kirby copied to clipboard

'intl' date handler outputs random translation on multilanguage site

Open iamlinkus opened this issue 2 years ago • 24 comments

Description

Using the new intl date handler on a multilanguage site, the date output language is picked randomly from the available languages.

Expected behavior
Date output language correponds to the current selected language.

Video of the behavior
See video

To reproduce

On a fresh install of Kirby 3.6.6

  1. Configure the site to have languages available and intl as the date handler:
// config.php
return [
  'languages' => true,
  'date.handler'  => 'intl',
]
  1. Add two languages:
// languages/lt.php
return [
    'code' => 'lt',
    'default' => true,
    'direction' => 'ltr',
    'locale' => [
        'LC_ALL' => 'lt_LT'
    ],
    'name' => 'Lietuvių',
]
// languages/en.php
return [
    'code' => 'en',
    'default' => false,
    'direction' => 'ltr',
    'locale' => [
        'LC_ALL' => 'en_US'
    ],
    'name' => 'English',
]
  1. Ouput the date with month as a word:
// somewhere in the template
<span><?= $page->date()->toDate('Y, LLLL') ?></span>
  1. Reload browser multiple times. (Sometimes it stays in one language for 10 reloads, sometimes it changes every second reload).

iamlinkus avatar May 11 '22 12:05 iamlinkus

This issue doesn't seem to be related to multi-language sites in particular. When changing the locale in a no-language installation from one locale to the next, e.g DE => EN, the same happens, or it doesn't change at all, as if something was cached somewhere.

texnixe avatar May 12 '22 11:05 texnixe

Does it only happen with intl or also with strftime?

lukasbestle avatar May 13 '22 10:05 lukasbestle

Does it only happen with intl or also with strftime?

Only with intl. strftime works as expected.

iamlinkus avatar May 13 '22 11:05 iamlinkus

Thank you, that's interesting. So something must be off with our intl implementation in particular. Because I can't imagine that it's PHP's fault. Or maybe it's a combination of weird circumstances?

For reference, the magic happens here:

https://github.com/getkirby/kirby/blob/005fc78fcc0e7f99f61e6973efa8ea8ec85741d9/src/Toolkit/Str.php#L279-L293

lukasbestle avatar May 15 '22 20:05 lukasbestle

Maybe we need to explicitly pass the locale?

distantnative avatar May 16 '22 06:05 distantnative

As a workaround, kirby allows you to pass your own intldateformatter instance, there you can specify the exact locale you wish:

<?php 
$ylllFormatter = new IntlDateFormatter(
    kirby()->languageCode(), 
    IntlDateFormatter::FULL, 
    IntlDateFormatter::FULL,
    null, // or an explicit timezone: 'Europe/Berlin',
    null, // or IntlDateFormatter::GREGORIAN,
    'Y, LLLL'
);
?>

....
<span><?= $page->date()->toDate($ylllFormatter) ?></span>

rasteiner avatar May 16 '22 14:05 rasteiner

I think @distantnative idea makes sense. The ::formatObject() method works great as it takes locale as the 3rd parameter.

// `intl` handler
if ($handler === 'intl') {
    $datetime = new DateTime();

    if ($time !== null) {
        $datetime->setTimestamp($time);
    }
    
    $locale = option('locale');
    
    if($language = kirby()->language()) {
        $locale = $language->locale();
    }
    
    if (is_array($locale) === true) {
        $locale = $locale['LC_TIME'] ?? $locale['LC_ALL'] ?? $locale[0] ?? current($locale);
    }

    return IntlDateFormatter::formatObject($datetime, $format, $locale);
}

afbora avatar May 24 '22 21:05 afbora

With this solution we need to be careful to use App::instance(null, true). Otherwise non-CMS use cases would load the CMS code. Another slight disadvantage: IntlDateFormatter objects that get passed by the user won't benefit from this.

Alternative solution (not tested yet, just an idea): We could set the default locale when the system is loaded. Issue: This can only be set once and not overridden. This could complicate things.

Another thing to keep in mind for both solutions: The ICU locale used by intl is not the same as the system locale. If I understand it correctly, the only difference is that ICU doesn't use the .UTF8 suffix.

lukasbestle avatar May 25 '22 20:05 lukasbestle

@lukasbestle

If I understand it correctly, the only difference is that ICU doesn't use the .UTF8 suffix.

afaik, technically the system locales refer to actual file names. It could be "UTF8", "UTF-8", no extension at all, etc. There also locales like "C" or "POSIX"... While ICU locales just use ISO language codes (different variants even), with or without a region code, even in combinations that don't really makes sense (like "Swiss German in Iraq"): https://onlinephp.io/c/ff4bc Looks like it's pretty forgiving

edit: actually it looks like it just ignores suffixes that it doesn't understand: https://onlinephp.io/c/7757cc

rasteiner avatar May 26 '22 11:05 rasteiner

Thank you for your research, then we at least don't need to do anything special regarding locales.

lukasbestle avatar May 26 '22 20:05 lukasbestle

It would be great if we could somehow recreate the bug as unit test, to make sure our fix really covers it and we don't reintroduce the problem later.

distantnative avatar May 27 '22 06:05 distantnative

What I find weird is that the behavior is random in the first place. If no locale is passed, shouldn't intl use its internal default locale? Where does this come from and why is not deterministic?

lukasbestle avatar May 27 '22 14:05 lukasbestle

Random intl language switch was happening to me even on single language website. My wild guess from testing it has to do with something process low level. It was happening between projects when i was trying to replicate it when i didn't explicitly set intl language.

I think two possible solutions A. Set default intl language using locale_set_default() somewhere B. Passing default IntlDateFormatter instance globally in config (instead of passing 'intl')

Reason for B. might be timezones and other settings. I am not sure how they play into this but from reading IntlDateFormatter::formatObject docs its pretty unexplicit how it sets details.

A. locale_set_default

@lukasbestle

We could set the default locale when the system is loaded. Issue: This can only be set once and not overridden. This could complicate things.

Why you think so?

The default value is empty, which forces the usage of ICU's default locale. Once set, the ini setting cannot be reset to this default value. It is not recommended that this default be relied on, as its effective value depends on the server's environment.

"this default value" i think it means you can't get that initial server env value back. Seems like it should be possible to overwrite. It also says "It is not recommended that this default be relied on" (which is current situation).

Simply setting locale_set_default() in config/index seems to fix everything for me. locale_set_default(option('locale')) sets cs_CZ.UTF-8 and works even though i don't have it in ResourceBundle::getLocales(''); only cs_CZ. This is mentioned above. Not sure if its too dangerous (probably?). Wild fix i am succesfully using

    public static function set($locale): void
    {
        $locale = static::normalize($locale);


        foreach ($locale as $key => $value) {
            setlocale($key, $value);
        }

        // set locale to LC_TIME or LC_ALL
        locale_set_default($locale[5] ?? $locale[0]);
    }

https://github.com/getkirby/kirby/blob/6b20fa11843f57cd9a1e611bc9e8e8a91b855156/src/Toolkit/Locale.php#L139

It might not be great idea to tie the C locales with ICU so maybe

'date'  => [
    'handler' => 'intl',
    'locale' => 'cs_CZ'
 ],

But that will be confusing for users because you have two locales.
I like solution 2. because it's most flexible but again bad for newcomers (and possible overwrite locally as @rasteiner).

iskrisis avatar May 28 '22 17:05 iskrisis

"this default value" i think it means you can't get that initial server env value back. Seems like it should be possible to overwrite.

@iskrisis Ah, you are absolutely right. I misread this part. So I think we should be fine here.

I really like your proposed solution in Toolkit\Locale::set(). It's really simple and fixes the issue not only for simple date formatting with a format string, but also for custom IntlDateFormatter objects. Since LC_TIME is only used for strftime() according to the PHP docs, I think it is not a problem to "recycle" this constant for our intl use. If the dev sets the date handler to intl, they won't need strftime(), so they can set the LC_TIME constant to whatever ICU locale they need.

Only change I suggest in your code example: Instead of the magic numbers 5 and 0, we should use LC_TIME and LC_ALL. The values of these constants can differ between systems. The normalization code in the Locale class already makes sure that the array keys use the constant values of the current system.

lukasbestle avatar May 29 '22 11:05 lukasbestle

@lukasbestle my implementation with array positions is bad. I just didn't know better the $locale is ordered like this for reason and i am not sure why.

Hopefully reusing LC_TIME isn't too clever.

What about timezones? I think it might lead to bugs because of winter/summer time and the conversion from UTC. So same situation with bad setting comming from server. Again date_default_timezone_set() exists so i guess users are fine. Just that it should probably be mentioned in docs.

Introducing explicit 'intl' settings for ICU

'date'  => [
    'handler' => 'intl',
    'locale' => 'cs_CZ',
    'timezone' => 'Europe/Berlin',
 ],

Would be option but i understand why not introducing new settings is better.

iskrisis avatar May 29 '22 16:05 iskrisis

To be honest I think timezones are out of scope of this feature. If we went the route of config options, then we could cover this as well, but then we would also need to add support for this to multilang.

lukasbestle avatar May 29 '22 18:05 lukasbestle

As a workaround, kirby allows you to pass your own intldateformatter instance, there you can specify the exact locale you wish:

<?php 
$ylllFormatter = new IntlDateFormatter(
    kirby()->languageCode(), 
    IntlDateFormatter::FULL, 
    IntlDateFormatter::FULL,
    null, // or an explicit timezone: 'Europe/Berlin',
    null, // or IntlDateFormatter::GREGORIAN,
    'Y, LLLL'
);
?>

....
<span><?= $page->date()->toDate($ylllFormatter) ?></span>

I have extended the code a bit so that it can be used more variably. Passing format for different outputs.

// date fix

function datefix($format, $date)
{
  $ylllFormatter = new IntlDateFormatter (
    kirby()->languageCode(), 
    IntlDateFormatter::FULL, 
    IntlDateFormatter::FULL,
    null,
    null,
    $format
  );
  return $ylllFormatter->format($date);
}
...

$date = new DateTime();
$currentDate = datefix( 'EEEE, d. MMMM YYYY', $date );
echo $currentDate;

Messa1 avatar Jul 11 '22 17:07 Messa1

I think we can solve this bug like @iskrisis has suggested here (only with the constants instead of magic numbers): https://github.com/getkirby/kirby/issues/4304#issuecomment-1140301619

Regarding timezones: I think for single-language sites, the timezone definition belongs in php.ini because PHP spits out warnings when no timezone is set. For multi-language sites, we can offer a timezone property for the language files. That's quite simple to do.

lukasbestle avatar Aug 03 '22 10:08 lukasbestle

Does setting timezones makes sense in multi language context? It might be that i was completely wrong. I wonder if there is case where i would have a website that's living in multiple timezones. There probably is but whats the issue here? To present the dates in such case i think you can be expected to do this manually because it's advanced. And saving the dates should stay same in all timezones right?

Kirby lets you also set locale to make things easier even though you could also set it in php.ini and other places. Not sure why timezone is different.

I would avoid setting timezones in multi language while not providing same settings in single language mode. You might end up with people using multi language with one language just to fix their timezones. Better to send them straight to setting it outside of kirby.

Then again we are only talking about INTL. I am not sure how that relates to php dates/times.

This seems pretty complicated BUT i think it should have the casual user in mind as priority. Those might not even realize there is a problem while most advanced users already found solutions outside of Kirby.

I agree

  1. locale needs to also set locale_set_default.
  2. locale_set_default should be set also for multi languages

Maybe thats enough?

iskrisis avatar Aug 03 '22 11:08 iskrisis

For this specific issue, certainly. I was thinking about creating a Nolt idea ticket for the timezone option, but it is really simple to implement, so we can combine it with the Intl fix.

Does setting timezones makes sense in multi language context? It might be that i was completely wrong. I wonder if there is case where i would have a website that's living in multiple timezones. There probably is but whats the issue here? To present the dates in such case i think you can be expected to do this manually because it's advanced. And saving the dates should stay same in all timezones right?

Yes, it doesn't change anything about the times that are saved. It's only about the output in the frontend. And there I can see cases where you may want to link the timezone to a language (where a language may not need to be just a translation, but a localization, e.g. you may have different languages for different countries like en-us, en-uk, en-au etc.). If an advanced user wants to set the timezone differently, they can still do. But then it really gets advanced. So the option to override the global php.ini timezone per language can be a solution for less complex use cases.

lukasbestle avatar Aug 03 '22 16:08 lukasbestle

Coming back to issue i had with Intl - i needed to set it on single language page. This would apply to timezones as well. So it seems to me if you introduce it in multi language it should be possible in single language as well. The problem is the possible confusion - Kirby mixes C locale with Intl locale in option.locale. But option.timezone would be only for intl it implies that it will be changing the timezone Kirby uses for saving dates.

Not sure how to solve it but it doesn't feel great to avoid it entirely. Like i said i think there will be people confused why they can't set the timezone in single language and they would be right.

iskrisis avatar Aug 03 '22 16:08 iskrisis

To make it consistent for me it would mean either

  • set date_default_timezone_set with option.timezone so it would affect both PHP date and Intl
  • not mix option.locale with Intl and use option.intl and have multi language set intl and locale independently too

Probably i am overthinking this because the cases are rare.

iskrisis avatar Aug 03 '22 16:08 iskrisis

Probably we should only fix the locale issue for now and leave the timezones unaffected. :)

lukasbestle avatar Aug 03 '22 20:08 lukasbestle

Agree let someone else to fige this out 🗡️

iskrisis avatar Aug 03 '22 21:08 iskrisis

✅ The default locale setting is now implemented as suggested by @iskrisis.

lukasbestle avatar Aug 14 '22 13:08 lukasbestle