Localization through `.po` files
Currently, language variables are stored in the database and supplied by a plugin as a xml file. The translation of these texts is not exactly easy for contributors, as there is no suitable external software/platform that can load our structure. In addition, if a translation is missing, a cryptic text (language variable) is currently output for the user with which he can do nothing. Some language variables also contain a lot of logic that is better placed in PHP or template files. The administrator must also be able to customize variables. A standardized format should solve these problems.
Pro/Cons for .po files
| Pro | Cons |
|---|---|
| Standardized format Simplifies the translation by contributors |
Parsing .po files takes some time, we have to cache them |
| Native support of plural forms | Are not easily searchable like a database |
| Different contexts(e.g. informal forms) are possible natively | gettext cannot load .po files directly, they have to be converted to .mo first |
| If a translation missing, no language variable is output but the text in English(default language of this language variable). | |
| We don't need to check if a variable already exists, variables from a plugin are stored in their own domain/category |
Possible libraries
The PHP extension gettext is not always available everywhere, which is needed to load .mo files. Additionally, we need something that can convert the readable .po files to .mo files, because gettext can only load .mo files.
We can use different libraries to load the .po files for the implementation.
Lightweight Library
- https://github.com/php-gettext/Translator
- https://github.com/php-gettext/Gettext
There is an interface for the native gettext library (GettextTranslator) if this extension is not installed, another implementation (Translator) can be used.
How to use:
msgid ""
msgstr ""
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"Language: de\n"
msgid "Simple message"
msgstr "Einfache Nachricht"
msgid "There is one file"
msgid_plural "There are {$files} files"
msgstr[0] "Es gibt eine Datei"
msgstr[1] "Es gibt {$files} Dateien"
msgctxt "informal"
msgid "You have one new message"
msgid_plural "You have {$messages} new messages"
msgstr[0] "Du hast eine neue Nachricht"
msgstr[1] "Du hast {$messages} neue Nachrichten"
msgctxt "formal"
msgid "You have one new message"
msgid_plural "You have {$messages} new messages"
msgstr[0] "Sie haben eine neue Nachricht"
msgstr[1] "Sie haben {$messages} neue Nachrichten"
use Gettext\Translator;
use Gettext\Loader\StrictPoLoader;
use Gettext\Generator\ArrayGenerator;
$translator = new Translator();
$loader = new StrictPoLoader();
$generator = new ArrayGenerator();
foreach (\glob(__DIR__ . '/locale/*.po') as $file) {
// Of course, we do not need to parse the `.po` files every time.
// We can use the `ArrayGenerator` to generate a string, which we save in a `.php` file and then load this instead of the `.po` file.
$translations = $loader->loadFile($file);
// Expected file name: `{package.name}-{local}.po`
// e.g. `com.woltlab.wcf-en_US.po`
$translations->setDomain($packageName);
// Insert the translations from our database here for the package `$packageName` if any exists
//$translations = $translations->mergeWith($adminTranslations, Merge::TRANSLATIONS_OVERRIDE);
$translator->addTranslations($generator->generateArray($translations));
}
// Get the translation
echo $translator->dgettext('com.woltlab.wcf', 'Simple message'); // Einfache Nachricht
echo $translator->dngettext('com.woltlab.wcf', 'There is one file', 'There are {$files} files', 1); // Es gibt eine Datei
// If the informal context is used, then this must be specified, otherwise no suitable translation will be found
echo $translator->dngettext('com.woltlab.wcf', 'You have one new message', 'You have {$messages} new messages', 5); // You have {$messages} new message
echo $translator->dnpgettext('com.woltlab.wcf', "formal", "You have one new message", 'You have {$messages} new messages', 5); // Sie haben {$messages} neue Nachricht
Symfony
- https://github.com/symfony/translation/
Never uses the native function from the gettext library.
The translator requires a message formatter for the output of the text. The standard implementation MessageFormatter uses the PHP function strtr to replace the variables and includes an IntlFormatter which can use the Intl-MessageFormatter from PHP. Which function is used depends on whether the domain ends with +intl-icu(use IntlFormatter) or not(use strtr).
When loading the .po file, the plural forms are processed, but these are saved internally as {singual-form}|{plural-form-1}|{plural-form-2}|... and this is exactly how we should call the function
echo $translator->trans('There is one file|There are fileCount files', ['fileCount' => 5]);
// Es gibt eine Datei|Es gibt 5 Dateien
Unfortunately, we can't do much with this and would have to use the IntlFormatter for the plural forms. Of course, we can implement our own formatter to use our template system and use the plural function.
How to use:
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"Language: de\n"
msgid "Simple message"
msgstr "Einfache Nachricht"
msgid "{apples, plural,=0 {There are no apples} =1 {There is one apple} other {There are # apples!}}"
msgstr "{apples, plural,=0 {Es gibt keine Äpfel} =1 {Es gibt einen Apfel} other {Es gibt # Äpfel!}}"
$local = 'de_DE';
$translator = new Translator($local, new MessageFormatter());
$translator->addLoader('po', new PoFileLoader()[de.preload.js](../61/01/js/preload/de.preload.js));
foreach (\glob(__DIR__ . '/locale/*.po') as $file) {
// Expected file name: `{package.name}-{local}.po`
$translator->addResource('po', $file, $local, $packageName);
// If we want to use the ICU message format, we need to add the suffix `+intl-icu`.
// Possibly the file name should contain this suffix, or we add it explicitly here.
// https://symfony.com/doc/current/reference/formats/message_format.html
// $translator->addResource('po', $file, $local, $packageName . "+intl-icu");
}
echo $translator->trans('Simple message', domain: "com.woltlab.wcf+intl-icu"); // Einfache Nachricht
echo $translator->trans(
'{apples, plural,=0 {There are no apples} =1 {There is one apple} other {There are # apples!}}',
['apples' => 5],
'com.woltlab.wcf+intl-icu'
); // Es gibt 5 Äpfel!
Client side implementation
In JavaScript, we cannot load the .po file directly because the translation from the database is missing. We therefore have the option of delivering an already parsed json file or using our precache. We must also be able to register different language variables in templates, the client does not need to load all language variables with every request.
The functions from gettext are not available on the client(browser). Therefore, we need a suitable library that provides the same or very similar functions as on the server side. These could be considered:
- https://github.com/i18next/i18next
- https://github.com/guillaumepotier/gettext.js/
Implementation
Where is the data stored?
Each plugin saves its .po files under {WCF_DIR}/locale/{$packageName}-{$local}.po.
E.g. the Core save his translations under ./locale/com.woltlab.wcf-en_US.po.
In addition, we save the customizations by the administrator in the database. These are loaded after loading the .po file to overwrite the translation.
Gettext can only load .mo files, we need generate them from the .po files. Generated .mo file must be located under {WCF_DIR}/locale/{$languageCode}/LC_MESSAGES/{$packageName}.mo.
We save the cache under {WCF_DIR}/locale/cache/{$languageCode}/{$packageName}.php. This is only required if the gettext extension is not available.
PHP implementation
We do not use a formatter for the output, this must be replaced later by a suitable implementation. We can use either the Intl-MessageFormatter or our template engine for this.
use Gettext\GettextTranslator;
use Gettext\Translations;
use Gettext\Translator;
use Gettext\TranslatorInterface;
class Language extends DatabaseObject {
/**
* @var string[]
*/
private array $domains = [];
private TranslatorInterface $translator;
/**
* Lookup a message and return the translated string.
*/
public function __(string $message, string $domain = 'com.woltlab.wcf', array $variables = []): string
{
$this->loadDomain($domain);
return $this->format(
$this->getTranslator()->dgettext($domain, $message),
$variables
);
}
/**
* Plural version of the `__()` function
* @see https://www.php.net/manual/en/function.dngettext.php
*/
public function n__(
string $singular,
string $plural,
int $count,
string $domain = 'com.woltlab.wcf',
array $variables = []
): string {
$this->loadDomain($domain);
return $this->format(
$this->getTranslator()->dngettext($domain, $singular, $plural, $count),
$variables
);
}
/**
* Lookup for a message in the specified context.
* @see https://www.php.net/manual/en/function.dcgettext.php
*/
public function p__(
string $context,
string $message,
string $domain = 'com.woltlab.wcf',
array $variables = []
): string {
$this->loadDomain($domain);
return $this->format(
$this->getTranslator()->pgettext($context, $message),
$variables
);
}
/**
* Plural version of the `p__()` function
* @see https://www.php.net/manual/en/function.dcngettext.php
*/
public function np__(
string $context,
string $singular,
string $plural,
int $count,
string $domain = 'com.woltlab.wcf',
array $variables = []
): string {
$this->loadDomain($domain);
return $this->format(
$this->getTranslator()->dnpgettext($domain, $context, $singular, $plural, $count),
$variables
);
}
/**
* Short function for `p__()` with the context "informal" or "formal" depending on the `LANGUAGE_USE_INFORMAL_VARIANT` constant.
*/
public function f__(
string $message,
string $domain = 'com.woltlab.wcf',
array $variables = []
): string {
$context = LANGUAGE_USE_INFORMAL_VARIANT ? "informal" : "formal";
return $this->p__($context, $message, $domain, $variables);
}
/**
* Short function for `np__()` with the context "informal" or "formal" depending on the `LANGUAGE_USE_INFORMAL_VARIANT` constant.
*/
public function fn__(
string $singular,
string $plural,
int $count,
string $domain = 'com.woltlab.wcf',
array $variables = []
): string {
$context = LANGUAGE_USE_INFORMAL_VARIANT ? "informal" : "formal";
return $this->np__($context, $singular, $plural, $count, $domain, $variables);
}
private function format(string $text, array $variables = []): string
{
// Need to implement a formatter for the output
return $text;
}
protected function loadDomain(string $domain): void
{
if (\array_key_exists($domain, $this->domains)) {
return;
}
$this->domains[] = $domain;
$translator = $this->getTranslator();
if ($translator instanceof GettextTranslator) {
// Use Gettext\Generator\MoGenerator to generate the `.mo` file
$path = WCF_DIR . '/locale/' . $this->languageCode;
$translator->loadDomain($domain, $path);
} else {
\assert($translator instanceof Translator);
// load precached translations from a php file
$translations = include WCF_DIR . '/locale/cache/' . $this->languageCode . '/' . $domain . '.php';
$translator->loadTranslations($translations);
}
}
private function getTranslator(): TranslatorInterface
{
if (!isset($this->translator)) {
if (\function_exists("gettext")) {
$this->translator = new GettextTranslator($this->languageCode);
} else {
$this->translator = new Translator();
}
}
return $this->translator;
}
}
So we can use this
msgid ""
msgstr ""
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"Language: de\n"
msgid "Simple message"
msgstr "Einfache Nachricht"
msgid "There is one file"
msgid_plural "There are {$files} files"
msgstr[0] "Es gibt eine Datei"
msgstr[1] "Es gibt {$files} Dateien"
msgctxt "informal"
msgid "You have one new message"
msgid_plural "You have {$messages} new messages"
msgstr[0] "Du hast eine neue Nachricht"
msgstr[1] "Du hast {$messages} neue Nachrichten"
msgctxt "formal"
msgid "You have one new message"
msgid_plural "You have {$messages} new messages"
msgstr[0] "Sie haben eine neue Nachricht"
msgstr[1] "Sie haben {$messages} neue Nachrichten"
use WCF\system\language\LanguageFactory;
$language = LanguageFactory::getInstance()->getLanguageByCode("de");
echo $language->__("Simple message"); // Einfache Nachricht
echo $language->n__("There is one file", 'There are {$files} files', 1, variable: ["files" => 1]); // Es gibt eine Datei
echo $language->__("There is no translation for this one!"); // There is no translation for this one!
$messages = 5;
// LANGUAGE_USE_INFORMAL_VARIANT = 0
echo $language->fn__('You have one new message', 'You have {$messages} new messages', $messages, variable: ["messages" => $messages]); // Sie haben {$messages} neue Nachrichten
// LANGUAGE_USE_INFORMAL_VARIANT = 1
echo $language->fn__('You have one new message', 'You have {$messages} new messages', $messages, variable: ["messages" => $messages]); // Du hast {$messages} neue Nachrichten
This is how we can use this in a template
{__ singular='Simple message'} --> Einfache Nachricht
{__ singular='There is one file' plural='There are {$files} files' count=1 files=1} --> Es gibt eine Datei
{__ singular='There is one file' plural='There are {$files} files' count=5 files=5} --> Es gibt {$files} Dateien
{* Use the function `Language::fn__()` by setting `useFormal=true`. *}
{__ singular='You have one new message' plural='You have {$messages} new messages' useFormal=true count=$messages messages=$messages} --> Sie haben {$messages} neue Nachrichten
Examples of how it could be used
final class ArticleCategoryAction implements RequestHandlerInterface {
// …
private function getForm(): Psr15DialogForm
{
// $form = new Psr15DialogForm(
// static::class,
// WCF::getLanguage()->get('wcf.article.button.setCategory')
// );
$form = new Psr15DialogForm(
static::class,
WCF::getLanguage()->__('Set category')
);
// …
}
}
final class UsersAwaitingApprovalAcpDashboardBox extends AbstractAcpDashboardBox
{
// …
#[\Override]
public function getTitle(): string
{
// return WCF::getLanguage()->getDynamicVariable('wcf.acp.dashboard.box.usersAwaitingApproval', [
// 'usersAwaitingApproval' => $this->getUsersAwaitingApproval(),
// ]);
return WCF::getLanguage()->n__(
'{$usersAwaitingApproval} User Awaiting Approval',
'{$usersAwaitingApproval} Users Awaiting Approval',
$this->getUsersAwaitingApproval(),
variable: [
'usersAwaitingApproval' => $this->getUsersAwaitingApproval(),
]
);
}
// …
}
Example of how to use a language variable from a plugin
…
{* <h2 class="sectionTitle">{lang}wbb.post.information{/lang}</h2> *}
<h2 class="sectionTitle">{__ domain="com.woltlab.wbb" singular="Information"}</h2>
…
…
{* <h2 class="sectionTitle">{lang}wbb.post.ipAddress.post{/lang}</h2> *}
<h2 class="sectionTitle">{__ domain="com.woltlab.wbb" singular='IP Address for {$post->username}' post=$post}</h2>
<table class="table">
<thead>
<tr>
{*
<th class="columnText">{lang}wbb.post.ipAddress.title{/lang}</th>
<th class="columnText">{lang}wbb.post.ipAddress.hostname{/lang}</th>
<th class="columnDate">{lang}wcf.global.date{/lang}</th>
*}
<th class="columnText">{__ domain="com.woltlab.wbb" singular='IP Address'}</th>
<th class="columnText">{__ domain="com.woltlab.wbb" singular='Hostname'}</th>
<th class="columnDate">{__ singular='Date'}</th>
</tr>
</thead>
…
Usage of dynamic variable names
We currently use WCF::getLanguage()->getDynamicVariable("wcf.like.objectType.{$objectType}");.
We have several options for implementing this
Option 1
msgctxt "like"
msgid "com.woltlab.wcf.comment"
msgstr "Comment"
WCF::getLanguage()->p__("like", $objectType);
Option 2
msgid "Comment"
msgstr "Comment"
interface ILikeObjectTypeProvider {
public function getTitle(): string;
}
class LikeableCommentProvider extends AbstractObjectTypeProvider implements
ILikeObjectTypeProvider,
IViewableLikeProvider
{
// …
public function getTitle(): string {
return WCF::getLanguage()->get('Comment', domain: "com.woltlab.wcf");
}
// …
}
$likeableProvider = ObjectTypeCache::getInstance()->getObjectTypeByName("com.woltlab.wcf.like.likeableObject", $objectType)->getProcessor();
$likeableProvider->getTitle();
Option 3
msgid "Comment"
msgstr "Comment"
<type>
<name>com.woltlab.wcf.comment</name>
<definitionname>com.woltlab.wcf.like.likeableObject</definitionname>
<classname>wcf\data\comment\LikeableCommentProvider</classname>
<title>Comment</title>
</type>
$objectType = ObjectTypeCache::getInstance()->getObjectTypeByName("com.woltlab.wcf.like.likeableObject", $objectTypeName);
WCF::getLanguage()->__($objectType->title, domain: $objectType->getPackage()->packageName);
Intl-MessageFormatter
Formatting with Intl's MessageFormatter has some advantages and disadvantages compared to our template engine.
| Pro | Cons |
|---|---|
| Numbers and dates are converted directly into the correct language format | Only simple variable replacements and if/switch cases are possible. e.g. Links must be passed by variable |
| Easier for translators to understand | Variable MUST be escaped before |
| The text should be parsed faster | Message need to be escaped before https://unicode-org.github.io/icu/userguide/format_parse/messages/#quotingescaping |
| Same output of the text on the client/server side |
The texts should be adapted as follows:
| Current | Intl - Format |
|---|---|
Simple text |
Simple text |
This is a <a href="{link controller="Foo"}{/link}">link</a>. |
This is a <a href="{link}">link</a>. |
Value of foo is: {$foo->bar} |
Value of foo is: {bar} |
{if $foo == 1}Foo is one{else}Foo has some other values{/if} |
{foo, select, 1 {Foo is one}other {Foo has some other values}} |
{if $foo == 1 && $bar == 1}Foo and bar are one{else}For and bar have some other values{/if} |
--- |
{implode from=$array item=item glue=' '}{$item}{/implode} |
--- |
{time time=$time} |
{date, date, long} {date, time}woltlab-core-date-time will be missing |
{#$count} |
{count,number,integer} |
What happened with current language variables?
As we need new functions and TemplateCompilerFunction for this, old variables will continue to function until they are removed from the database. From this point on, only the language variable is output.