message-format-wg Design Principle: Computational vs. Manual

A dedicated issue for discussing one of the design dimensions proposed in #50.

Computational vs. Manual

Do we want the runtime to have some capacity to transform translations, e.g. by providing a method to automatically turn text to title-case? Or do we want localizers to provide all possible variants of translations to ensure all edge-cases are handled manually?

Related Issues: #35, #36, #38

Mar 02 '20 16:03 stasm

I would think that the number of computational transforms should be minimized in favor of providing translators the discretion to apply the best text style. While it may lead to some additional verbosity, embedding such translators in the runtime may lead to unexpected consequences if used too much.

Mar 02 '20 17:03 Fleker

When this was discussed in the meeting on 15th June, it felt to me like the computational end of this axis would feature creep this effort into a universal rule based machine translation attempt. E.g. title casing, I am not sure that there is another language apart from English that has title casing and even in English title casing is style guide dependent, so title casing doesn't seem to be a good candidate for a computational feature in the new message format. On the other hand rendering factoids such as dates, currency amounts, quantities with units of measures etc. seems as a good candidate for runtime formatting.. But even here there are caveats in morphologically rich languages (such as Slavic languages) that could be hard to handle "computationally":

"This has to be done by [date]." - [date] replaced at runtime with e.g.- 16th June -> Czech: 16. června "The day was [date]." - [date] replaced at runtime with e.g.- 16th June -> Czech: 16. červen

So in the above case the translator should be given an option to use something like or give a formatting hint such as [date-adverb] or [date-nominativ] in their translation.

One of the most common failures in marketing email campaigns localized from English into Slavic languages is to start the message with something like

Hello [user], ---> wrong Czech: Pavel Hello [first_name], ---> wrong Czech: Petra Hello [full_name], ---> wrong Czech: David Filip

as these languages use a specific vocative form and simply replacing the above name placeholders with a name stored in a CRM will more often than not lead to a grammatically wrong (even insulting) form of address. In the current state of the art, this is best solved by using a neutral salute that doesn't require use of the name. Czech: Dobrý den,

The name variable would have to be marked as canDelete="yes" during Extraction (and the builder/compiler would have to be happy with dropping the name). See https://galaglobal.github.io/TAPICC/T1/WG3/rs01/XLIFF-EM-BP-V1.0-rs01.xhtml#Hints or http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#editinghints

If we went for "computational", the formatter would have to have knowledge of vocative forms creation in the target language to be able to interpret Hello [user-vocative], ---> correct Czech: Pavle Hello [first_name-vocative], ---> correct Czech: Petro Hello [full_name-vocative], ---> correct Czech: Davide Filipe

Jul 27 '20 18:07 DavidFatDavidF

But even here there are caveats in morphologically rich languages (such as Slavic languages) that could be hard to handle "computationally":

ICU supports a parameter that defines where in the sentence a given formatted string will appear. What ECMA402 does now is "stand-alone", but we do plan to add displayContext - https://github.com/tc39/ecma402/issues/355

If we went for "computational", the formatter would have to have knowledge of vocative forms creation in the target language to be able to interpret

I'm in support of us supporting this - basically, we should be able to provide the user name declensed and the localization should use it where appropriate.

Jul 27 '20 19:07 zbraniecki

I suspect that the underlying request here is for some standardized built-in functions for the registry.

I also think that displayContext might be related to the @annotation feature in #450 #426 (since display context might be externally supplied at runtime--contextually one might say 😉--but a message might wish to override it for a given formatter, e.g.:

{The date in a sentence is {$now :datetime}}
{The date as standalone is: {$now :datetime \@displayContext=standalone}}

Aug 19 '23 16:08 aphillips

I'm closing this issue in favor of having specific issues raised against the default registry.

Jan 14 '24 16:01 aphillips

message-format-wg message-format-wg copied to clipboard

Design Principle: Computational vs. Manual

Computational vs. Manual

message-format-wg
message-format-wg copied to clipboard