Reconsider FluentBundle#formatToParts()

Open eemeli opened this issue 6 years ago • 1 comments

With the bundle & message API getting reconsidered (most recently in #380), and being redirected to being more explicitly a lowest-level API for messages, I would like for the earlier decision to remove formatToParts() to be reconsidered, possibly to the extent of only providing a formatToParts method on the bundle, rather than format or formatPattern. My main argument here is that it should be an implementation-specific decision for how to stringify or represent each non-string part.

Having looked through the history, the only arguments for this that I've been able to find were from @stasm:

Its only use case in fluent-react was replaced by overlays (#104)
Its implementation was "buggy and under-spec'ed" (#104)
Passing "React elements as props to <Localized> to interpolate them into the translation [...] is a bad localization practice because it results in the translation being split into multiple strings and then interpolated." (#103)

Addressing each of the above concerns in turn:

At least vue-i18n supports what it calls component interpolation, which effectively allows for components (i.e. objects) to be passed through the localization without stringification. For messageformat I've just filed a PR (messageformat/messageformat#242) enabling this by making the output type configurable, such that its compiled formatter functions may return an array of parts instead of a single string. Similar use cases are likely to be found outside of the core Fluent libraries, should a formatToParts method be available.
For another use case, consider translations that would be used in more than one output format. One that I've encountered personally is using the same translations both in React/HTML as well as plain-text emails. In HTML, it's useful to be able to express emphasis for the same strings as foo, but then use markdown-ish _foo_ in plain-text contexts.
As is, the implementation of the formatting functions is getting refactored, and from that premise it'd be rather easy to define method's behaviour and output.
I don't agree with this assertion. If it were true, why would terms be included in the Fluent spec? Of course it's possible to construct over-complex localizations with interpolated parts, but exactly that is already enabled by terms. Without the variable pass-through that formatToParts would enable, the only currently available solution is to use overlays, and to re-parse the output string as XML before being able to construct the actual output. And that seems rather clumsy.

Implementation-wise, the change here would be minimal. Based on stasm:formatPattern:

Return result rather than result.join("") from Pattern() in resolver.js
For string patterns, wrap the return from formatPattern() in bundle.js in an array, or allow the function to return either a string or an array.

With those changes, here's an example of what would be possible:

accept-terms = I accept the {$tosLink}.
  .tos = Terms of service

import React from { react } // just for the example

function AcceptTermsLabel({ bundle, href, ...props }) {
  const msg = bundle.getMessage('accept-terms')
  const tos = bundle.formatToParts(msg.attributes.tos)
  const tosLink = <a href={href} key="tos">{tos}</a>
  return (
    <label {...props}>
      {bundle.formatToParts(msg.value, { tosLink })}
    </label>
  )
}

Jun 30 '19 12:06 eemeli

Thanks for opening this, @eemeli, and for a thorough analysis. Would https://github.com/projectfluent/fluent/issues/273 be a better place to discuss this? I'll reply here for now; if you'd like to copy your comments to the spec repo, I'll copy my reply there as well.

I really like the idea behind formatToParts. In practice however, and for the use-cases we were facing back in the day, it proved to be limiting. Furthermore, the same use-cases were solved by overlays, which we already had experience with from fluent-dom. I still think the decision to remove formatToParts was a right one back then. I'm open to revisiting it today.

Before diving into the details, let me put my product-owner hat. I'd like to consider this API outside of the scope of the planned FluentBundle/ formatPattern changes. formatToParts is unexplored territory. It might prove to be very powerful and helpful, and I do think we should explore it. As far as the planned redesign of the FluentBundle API goes, however, one of the biggest benefits of the formatPattern proposal is that it improves on the current format API which has been battle-tested for 2 years in Firefox, and for even longer in Firefox OS before that. The recent discussion made me realize that I really just want to do a clean-up or a refresh of the current API with the goal of releasing a 1.0 version of the implementation which has been used in production for years. I feel strongly that major new additions to this API should be considered for a 2.0 version of @fluent/bundle.

That said, let's start planning right now! :)

I think formatToParts enables new powerful ways of working with translations. I'm not entirely sold on component interpolation, however. Even in your minimal example:

accept-terms = I accept the {$tosLink}.
  .tos = Terms of service

…there are three issues I would point out:

It's not possible to localize the title attribute of the <a> element.
It might not be immediately clear that $tosLink and .tos are somehow related.
The translation is unnaturally split across the value and the attribute. It's true that terms can result in similar splits, but most of the time, terms are meant to substitute single nouns (like brand names) rather than entire parts of sentences.

All these are fixed by overlays, which is why we decided it was a better approach to localizing markup. Plus, overlays open up a lot of possibilities related to nested markup, or text-level markup which isn't present in the source (like using  for borrowed words).

# With Overlays, localizers work with the entire sentence in its full form.
accept-terms = I accept the <a title="TOS">Terms of Service</a>.

It's worth pointing out that overlays don't play well with formatToParts. If the translation is Foo {$link} Bar, the parts are "Foo ", interpolated component, and " Bar". It's not trivial to parse markup in each of them separately.

Overlays are based on the low-level format API which returns a simple string. I acknowledge the fact that there might be other approaches to handling markup in translations, or other use-cases overall, which would benefit from the low-level formatToParts API. I even once tried implementing one myself, see https://github.com/projectfluent/fluent.js/pull/49 :) So I don't want to dismiss formatToParts just because there are overlays. Like I said above, I'm open to re-adding it to the FluentBundle API. I just think it's a big undertaking.

Implementation-wise, I think this is more complex that it appears. https://github.com/projectfluent/fluent/issues/273 in the spec repo would be the best place to discuss the details.

Return result rather than result.join("") from Pattern() in resolver.js

result is an array of formatted FluentTypes (string, FluentNumber etc). We'd need to either not format parts which are somehow marked as special, or wrap non-string parts in a special no-op FluentType subclass whose format just returns the instance itself. IIRC, this is how formatToParts used to work before it was removed.

This has consequences for scenarios which require the formatted result to be postMessaged between processes, or in which it comes from implementations in WebAssembly. I'm sure it's all doable in the end, but working with strings returned from formatPattern is likely easier in these situations.

There are also places inside of the resolver which need the stringified version of the referenced pattern to work. Consider:

-term = Term
    .gender = {$arg}
hello = {-term.gender ->
    [male] ...
    [female] ...
   *[other] ...
}

-term.gender needs to be resolved to a string so that it can be compared against the variant keys in hello. What should happen if $arg is a non-string part? Should there be a defined way to stringify non-string parts?

Furthermore, in https://github.com/projectfluent/fluent/issues/273#issuecomment-506370303 @Pike suggests that formatToParts could be useful for escaping and bidi isolation. To make it work, I think the parts yielded from formatToParts should be something more than just the result of calling FluentType.format on each pattern element. We'd likely want to at least add some metadata about the origin of the part (message, term, variable?), but I haven't yet though about this in detail.

Jul 03 '19 19:07 stasm