gettext icon indicating copy to clipboard operation
gettext copied to clipboard

Plural formas parser for pluralization

Open maennchen opened this issue 3 years ago • 2 comments

Proposed Solution

  • Leave default implementation as is
  • Allow plugging in a custom pluralization module (already possible)
  • Extend Gettext.Plural
    • The plural forms header should be passed to both plurals & nplurals
  • Based on our experiences going forward, this behavior could also be made the default if people agree to do so.

Problem description

gettext currently does not parse the Plural-Forms header in .po / .mo files. Instead, it provides an elixir version of default pluralizations. This has a few shortcomings:

Missing Locales

There are 183 ISO3166 languages, while gettext provides 144.

Mismatch Plural Forms Header / Elixir default

Translation tools, like POEDIT, Crowdin, Transfex, all support and can output of plural-forms headers. If Elixir doesn’t support those headers then it's possible this is a mismatch between the translator's intent and the result. See

  • https://docs.transifex.com/formats/gettext
  • https://github.com/vslavik/poedit/blob/master/src/language_impl_plurals.h
  • https://store.crowdin.com/gnu-gettext/

XLIFF

IFF is the premium interchange solution for translators and translation tooling and it also embeds plural-rule headers

https://docs.oasis-open.org/xliff/v1.2/xliff-profile-po-1.2-pr-02-20061016-DIFF.pdf

Primitive Fallback

Gettext has extremely primitive locale fallback mechanisms. It has no proper support for BCP47 language tags. In such a case, its ability to resolve the correct Gettext locale is typically just text equality matching. A totally valid locale of en_Latn_US wouldn’t ever match a gettext locale of en. In which case, how would a user specify plural forms if they can’t provide a plural-forms header themselves?

Thanks to @kipcole9 for providing me with detailed arguments in favor of this feature.

Dependencies / Performance

As discussed with @josevalim via Slack, we do not want to introduce new dependencies into the gettext library other than expo. This could be solved either by using nimble_parsec.compile and removing nimble_parsec as a dependency or to switch to a yacc based approach as well.

Relates to

  • https://github.com/elixir-gettext/gettext/pull/313
  • https://github.com/elixir-gettext/gettext/pull/306
  • https://github.com/elixir-gettext/expo/pull/64
  • https://github.com/elixir-gettext/expo_plural_forms

maennchen avatar Jul 21 '22 10:07 maennchen

Any extension to Gettext.Plural to support the described use cases are welcome.

josevalim avatar Jul 21 '22 11:07 josevalim

@josevalim I‘ll do that once I have time 😊

maennchen avatar Jul 21 '22 12:07 maennchen

@maennchen this would be what https://github.com/elixir-gettext/expo_plural_forms is?

whatyouhide avatar Dec 23 '22 16:12 whatyouhide

@whatyouhide Yes and no:

The repo is able to parse the headers and accurately choose the correct plural form based on that.

It is however not a complete solution yet to everything described in the issue nor is it currentöy possible to use this with gettext.

maennchen avatar Dec 23 '22 17:12 maennchen

@maennchen why is it not possible to use it with Gettext?

It is however not a complete solution yet to everything described in the issue

The issue is too broad IMO. For example, the locale fallback doesn't belong together with the parsing of Plural-Forms header IMO, so I think we can split those up.

whatyouhide avatar Dec 24 '22 08:12 whatyouhide

@whatyouhide The current plural behavior exposes only the locale as an argument. To use the plural form parser, it would need the plural forms header as well.

Agreed on the issue beeing to broad, we should separate plural form detection from the issue of choosing the correct translation depending on the users given locales.

maennchen avatar Dec 24 '22 09:12 maennchen