message-format-wg
message-format-wg copied to clipboard
Design Principle: Computational vs. Manual
A dedicated issue for discussing one of the design dimensions proposed in #50.
Computational vs. Manual
Do we want the runtime to have some capacity to transform translations, e.g. by providing a method to automatically turn text to title-case? Or do we want localizers to provide all possible variants of translations to ensure all edge-cases are handled manually?
Related Issues: #35, #36, #38
I would think that the number of computational transforms should be minimized in favor of providing translators the discretion to apply the best text style. While it may lead to some additional verbosity, embedding such translators in the runtime may lead to unexpected consequences if used too much.
When this was discussed in the meeting on 15th June, it felt to me like the computational end of this axis would feature creep this effort into a universal rule based machine translation attempt. E.g. title casing, I am not sure that there is another language apart from English that has title casing and even in English title casing is style guide dependent, so title casing doesn't seem to be a good candidate for a computational feature in the new message format. On the other hand rendering factoids such as dates, currency amounts, quantities with units of measures etc. seems as a good candidate for runtime formatting.. But even here there are caveats in morphologically rich languages (such as Slavic languages) that could be hard to handle "computationally":
"This has to be done by [date]."
- [date]
replaced at runtime with e.g.- 16th June
-> Czech: 16. června
"The day was [date]."
- [date]
replaced at runtime with e.g.- 16th June
-> Czech: 16. červen
So in the above case the translator should be given an option to use something like or give a formatting hint such as
[date-adverb]
or
[date-nominativ]
in their translation.
One of the most common failures in marketing email campaigns localized from English into Slavic languages is to start the message with something like
Hello [user],
---> wrong Czech: Pavel
Hello [first_name],
---> wrong Czech: Petra
Hello [full_name],
---> wrong Czech: David Filip
as these languages use a specific vocative form and simply replacing the above name placeholders with a name stored in a CRM will more often than not lead to a grammatically wrong (even insulting) form of address.
In the current state of the art, this is best solved by using a neutral salute that doesn't require use of the name.
Czech:
Dobrý den,
The name variable would have to be marked as canDelete="yes"
during Extraction (and the builder/compiler would have to be happy with dropping the name).
See
https://galaglobal.github.io/TAPICC/T1/WG3/rs01/XLIFF-EM-BP-V1.0-rs01.xhtml#Hints or
http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#editinghints
If we went for "computational", the formatter would have to have knowledge of vocative forms creation in the target language to be able to interpret
Hello [user-vocative],
---> correct Czech: Pavle
Hello [first_name-vocative],
---> correct Czech: Petro
Hello [full_name-vocative],
---> correct Czech: Davide Filipe
But even here there are caveats in morphologically rich languages (such as Slavic languages) that could be hard to handle "computationally":
ICU supports a parameter that defines where in the sentence a given formatted string will appear. What ECMA402 does now is "stand-alone", but we do plan to add displayContext
- https://github.com/tc39/ecma402/issues/355
If we went for "computational", the formatter would have to have knowledge of vocative forms creation in the target language to be able to interpret
I'm in support of us supporting this - basically, we should be able to provide the user name declensed and the localization should use it where appropriate.
I suspect that the underlying request here is for some standardized built-in functions for the registry.
I also think that displayContext
might be related to the @annotation
feature in #450 #426 (since display context might be externally supplied at runtime--contextually one might say 😉--but a message might wish to override it for a given formatter, e.g.:
{The date in a sentence is {$now :datetime}}
{The date as standalone is: {$now :datetime \@displayContext=standalone}}
I'm closing this issue in favor of having specific issues raised against the default registry.