WCF icon indicating copy to clipboard operation
WCF copied to clipboard

Multilingual Content and Localization

Open dtdesign opened this issue 7 months ago • 0 comments

There have been multiple attempts at improving the handling of multilingual content, with #6109 being the latest iteration of it. It does solve a lot of the existing issues with phrases and the ability to reliably sort and filter by localized values but breaks the ability to provide third party translations. In addition, the localization would be moved into the PIP XML which is just moving the problem around instead of solving it for good.

Requirements

  • Localized values must be stored in a separate database table for each type. This enables users to search and sort by localized values.
  • Storing localized values for DBOs should be achieved using a collection instead which isolates the dynamic data and avoids cluttering the DBO itself.
  • The primary database table must have a nullable column identifier which uniquely identifies an item. If the row was created by the user, it must be NULL.
  • Any row in the localized table must have the flag isPristine that is:
    • Always 0 when the identifier is NULL.
    • 1 if the user has never modified any of the tracked values of this language. When saving content, the flag must only be set to 0 if any of the values differs.
    • 0 if the user has modified any of the tracked values of this language.
  • There needs to be some logic to sync the data stored in the database whenever a package or translation is updated. This must follow the above rules for identifier and isPristine.

Helper Methods

Collections

We already have an internal draft for collections that aim to replace the dreaded Viewable* lists while also providing a lazy method to fetch additional data. They need a bit more testing before we can slot them in but for the time being we can just pretend that they magically solve this problem.

Synchronizing the Database Values

A helper class could solve the problem of synchronizing the database values, using whatever source is available. This can take care of both adding localized values for a new languages or new input fields that previously did not exist.

<?php

use wcf\data\language\Language;
use wcf\system\WCF;

final class Localization
{
    public function __construct(
        private readonly string $primaryTableName,
        private readonly string $contentTableName,
        private readonly string $primaryColumnName,
        /** @var list<string> */
        private readonly array $columnNames,
        private readonly string $phrasePrefix,
    ) {}

    public function synchronize(Language $language): void
    {
        $mapping = $this->getLocalizableItems();
        if ($mapping === []) {
            return;
        }

        $values = $this->getLocalizedValues(\array_values($mapping));
        if ($values === []) {
            return;
        }

        $columnList = \implode(', ', $this->columnNames);
        $placeholders = \implode(', ', \array_map(static fn() => '?', $this->columnNames));
        $updateValues = \implode(
            ', ',
            \array_map(
                function (string $columnName) {
                    // `ON DUPLICATE KEY UPDATE` does not support a `WHERE` condition.
                    return "{$columnName} = IF(isPristine = 1, VALUES({$columnName}), {$columnName})";
                },
                $this->columnNames
            )
        );

        $sql = "INSERT INTO {$this->contentTableName}
                            ({$this->primaryColumnName}, languageID, isPristine, {$columnList})
                VALUES      (?, {$language->languageID}, 1, {$placeholders})
                ON DUPLICATE KEY UPDATE
                            {$updateValues}";
        $statement = WCF::getDB()->prepare($sql);

        foreach ($mapping as $objectID => $identifier) {
            $localizedValues = $values[$identifier] ?? null;
            if ($localizedValues === null) {
                continue;
            }

            $parameters = [$objectID];
            foreach ($this->columnNames as $columnName) {
                $parameters[] = $localizedValues[$columnName];
            }

            $statement->execute($parameters);
        }
    }

    /**
     * @return array<string, string>
     */
    private function getLocalizableItems(): array
    {
        $sql = "SELECT  {$this->primaryColumnName}, identifier
                FROM    {$this->primaryTableName}
                WHERE   identifier IS NOT NULL";
        $statement = WCF::getDB()->prepare($sql);
        $statement->execute();

        return $statement->fetchMap($this->primaryColumnName, 'identifier');
    }

    /**
     * @param list<string> $identifiers 
     * @param list<string> $columns
     * @return array<string, array<string, string>>
     */
    private function getLocalizedValues(Language $language, array $identifiers, array $columnNames): array
    {
        // This returns a mapping of the following structure:
        // [
        //   'someIdentifier' => [
        //     'columnA' => 'Title',
        //     'columnB' => 'Some fancy description',
        //   ],
        // ]
        //
        // If an identifier is set, a valid value must be specified for each
        // column name, they may not me omitted.
        return [];
    }
}

Unresolved Problems

  • The localized values are currently pulled out of thin air because there is no updated system in place for the localizations. There are plans to move these into PHP files with the wcf1_language_item table serving as an override for these values. Since there has been no finalized draft for this one yet, it is omitted from the draft.
  • The handling of new columns is a bit awkward and the code currently expects getLocalizedValues() to provide values for all listed columns. We could throw hard in dev mode but otherwise just silently skip those values to avoid bricking a live installation.
  • Monolingual content is currently identified by having just a single matching row in the content table with the language id set to NULL. We probably need a flag isMultilingual or isLocalized in the main table to distinguish them and to skip synchronizing those.

dtdesign avatar May 12 '25 12:05 dtdesign