globalize icon indicating copy to clipboard operation
globalize copied to clipboard

MessageFormat formatters

Open nkovacs opened this issue 9 years ago • 23 comments

The messageformat library supports custom formatter functions. You could register globalize's formatters, so they could be used in messages, e.g. "Balance: {0, currency}", or "Posted at {0, datetime, long}".

It would also be great if there was a way to pass custom formatter functions to MessageFormat.

nkovacs avatar Dec 04 '15 14:12 nkovacs

Thanks for your message and sorry for the delayed answer.

Please, use variable replacement instead, e.g.: "Balance: {currency}" or "Posted at {date}" and have the variable formatted using the appropriate formatter in your code, e.g.:

Globalize.formatMessage("message", {date: Globalize.formatDate(new Date())});

If you find any problem using variable replacement instead or if you have further questions feel free to post additional comments.

PS:

The messageformat library supports custom formatter functions

... and I was one of the early pushers for such API to be adopted by SlexAxton/messageformat.js (the libraries Globalize uses for mesage formar under the hoods) (link). :smile: (and Alex an Eemeli did a great work updating the library). Having said that, given variable replacement could be used instead with no prejudice in that case, we opt for that.

If you want to update globalize message format to support such feature, feel free to contribute the change and I'd be happy to consider it: (a) send informal messages first to discuss the new API, then send a pull request with the implementation.

rxaviers avatar May 11 '16 01:05 rxaviers

The problem with using a formatter in the variable is that it doesn't allow you to change the format in the message file. It's hard-coded. This would not only allow the format to be customized for each language, it would also allow changing it without touching the code. E.g. if you have an interface where an admin can change the message files used by your app, this change would allow an admin to customize the format used in a message.

nkovacs avatar May 11 '16 06:05 nkovacs

I made a quick proof of concept. The issues with it are:

  • the formatter function only gets a locale string (or array), so it needs to create a new Globalize instance: https://github.com/nkovacs/globalize/commit/bba99b7a82d25d92818c08114132c2657c7a071a#diff-731a3fca6b201d79e2639fe1456b8787R166
  • I added a global object to Globalize in core.js to collect the formatters: https://github.com/nkovacs/globalize/commit/bba99b7a82d25d92818c08114132c2657c7a071a#diff-7eb52b366866677666470e019283c8eaR89. This is probably not the most elegant way to do this.
  • The compiler compiles the message 'Hello World {now, date, long}' into this:
Globalize.b955419430 = messageFormatterFn((function(  ) {
  return function (d) { return "Hello World " + fmt.date(d.now, ["en"], "long"); }
})()

Ideally I'd like the compiler to automatically detect the call to fmt.date, and compile the dateformatter as well, but I don't know if that's possible with the current version of MessageFormat. This is the test file I used: https://gist.github.com/nkovacs/d6e429f7a5e0871ceb392e739031c100

nkovacs avatar May 11 '16 08:05 nkovacs

As an earlier step, could you please show me a map between each Globalize formatters option and its inlined message format representation? For example, above you mentioned long, is long the value for date, time, or datetime? Also, how to pass a skeleton?

Ideally I'd like the compiler to automatically detect the call to fmt.date, and compile the dateformatter...

Yeap the compiler could statically parse the message and do that (i.e., reuse the message formatter parser to deduce the formatters).

rxaviers avatar May 11 '16 12:05 rxaviers

The message was {now, date, long} in the example, so it becomes {date: 'long'}. {now, time, long} would be {time: 'long'}, and {now, datetime, long} would be {datetime: 'long'}.

This is similar to what ICU does (http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html), except ICU only has date and time.

ICU also accepts a raw format (if the parameter is not one of the short formats), so {now, date, yyyy-MM-dd} would become {raw: 'yyyy-MM-dd'} if you wanted to emulate that.

Skeleton could be implemented in a few different ways:

  • {now, date, skeleton:GyMMMd}: this means you can't have {raw: 'skeleton:yyyy-MM-dd'}, but I don't think that's a big problem.
  • {now, date, skeleton, GyMMMd}: presumably no-one wants to use 'skeleton' as a raw value (or if the last value is missing, skeleton becomes the raw value, so there's no conflict)
  • {now, dateskeleton, GyMMMd}

The basic mapping could look like this:

  • {now, date, long} -> dateFormatter({date: 'long'})
  • {now, time, long} -> dateFormatter({time: 'long'})
  • {now, datetime, long} -> dateFormatter({datetime: 'long'})
  • {x, relativetime, day} -> relativeTimeFormatter('day')
  • {x, relativetime, day, short} -> relativeTimeFormatter('day', {form: 'short'})
  • {x, number} -> numberFormatter()
  • {x, number, percent} -> numberFormatter({style: 'percent'})
  • {x, currency, USD} -> currencyFormatter('USD')
  • {x, currency, USD, accounting} -> currencyFormatter('USD', {style: 'accounting'})
  • {x, unit, second} -> unitFormatter('second')
  • {x, unit, second, short} -> unitFormatter('second', { form: "short" })

But one reason I'd like to be able to customize the formatters is that I wanted to integrate globalize with Yii, and I could write custom formatters that work the same way ICU in php does (http://www.yiiframework.com/doc-2.0/guide-tutorial-i18n.html#message-formatting). That way I could use the same messages in php and in javascript, which is kind of a pain right now (I have to render everything in php to get pluralization and such things, and then return the html in the ajax response).

Yeap the compiler could statically parse the message and do that (i.e., reuse the message formatter parser to deduce the formatters).

Yeah, but for globalize I think it would be better if the custom formatter function received the Globalize instance, the same one used to render the message, so you don't have to instantiate another one and compile a new formatter each time the message is rendered (if you're not using pre-compiled files). But that requires modifying messageformat or using messageformat-parser directly. The runtime binding code is already a bit hacky, it could also be cleaned up.

nkovacs avatar May 11 '16 13:05 nkovacs

I liked it so far. /cc @jzaefferer and @alunny for their inputs.

rxaviers avatar May 11 '16 14:05 rxaviers

I've hacked messageformat so that the custom formatter function can use the same globalize instance, and the dependent formatters can be passed to globalize-compiler: https://github.com/nkovacs/globalize/commit/21cb3b3923fbac0937bc5ab0d626d6e107a6fb30#diff-731a3fca6b201d79e2639fe1456b8787L156

I'll try to clean it up and get it into messageformat.js, but for now I just wanted to show how it could be done in globalize.

nkovacs avatar May 11 '16 15:05 nkovacs

This sounds great. Any news on it?

ccschneidr avatar Aug 01 '16 07:08 ccschneidr

Compilation now works. The only thing that needs to be changed in globalize-compiler is the compilation order.

Messageformat.js has since released a major new version, so I'll have to update that too.

nkovacs avatar Oct 05 '16 12:10 nkovacs

Messageformat.js 1.0 has changed so much that the hacks used to integrate it into globalize no longer work. In particular, since the runtime is no longer static, I was unable to extract it and inject it into globalize's message-runtime module.

So instead I copied the messageformat compiler and runtime into globalize.js, and used messageformat-parser (which has since been extracted into a separate npm package). Since I now had direct access to the compiler, I was also able to remove the regexp hacks in messageFormatterRuntimeBind (the compiler can tell the runtime binding function what features are needed, e.g. plurals, select etc.).

Here's the commit: https://github.com/nkovacs/globalize/commit/1586e12ff1b7f24a649e442899e76575f6c19b2d

What do you think?

nkovacs avatar Oct 16 '16 14:10 nkovacs

These presets look nice:

{now, date, long} -> dateFormatter({date: 'long'}) {now, time, long} -> dateFormatter({time: 'long'}) {now, datetime, long} -> dateFormatter({datetime: 'long'})

Any of the below look nice to me too, except for the fact that adding a time pattern in the skeleton below will result in a datetime output, which sounds inconsistent with date since we have message formatters named time or datetime. Do you see what I mean? I have no suggestion at the moment though.

{now, date, skeleton, GyMMMd} {now, dateskeleton, GyMMMd}

rxaviers avatar Nov 01 '16 15:11 rxaviers

Messageformat.js 1.0 has changed so much that the hacks used to integrate it into globalize no longer work. In particular, since the runtime is no longer static, I was unable to extract it and inject it into globalize's message-runtime module.

So instead I copied the messageformat compiler and runtime into globalize.js, and used messageformat-parser (which has since been extracted into a separate npm package). Since I now had direct access to the compiler, I was also able to remove the regexp hacks in messageFormatterRuntimeBind (the compiler can tell the runtime binding function what features are needed, e.g. plurals, select etc.).

The existing "live-patch" isn't great, but I want to avoid copying dependencies and modifying them, because this is even harder to maintain over time. We need a better approach... Are there any changes we could propose in their code that would make it easier in our side? Is there any sort of JavaScript patch we could use instead of the bunch of replaces in Gruntfile?

rxaviers avatar Nov 01 '16 15:11 rxaviers

About plural requiring cardinal + ordinal data... I want to avoid that. I'm wondering if {plural, ... could use a cardinal formatter, and {selectordinal, ... could use a ordinal formatter?

rxaviers avatar Nov 01 '16 15:11 rxaviers

Any of the below look nice to me too, except for the fact that adding a time pattern in the skeleton below will result in a datetime output, which sounds inconsistent with date since we have message formatters named time or datetime.

I didn't implement the skeleton and raw options yet because I'm not sure how to do that. The rest are done: https://github.com/jquery/globalize/commit/4c95d9499efa4add7e0ed80fb8a531f62de754de.

About plural requiring cardinal + ordinal data... I want to avoid that. I'm wondering if {plural, ... could use a cardinal formatter, and {selectordinal, ... could use a ordinal formatter?

With the custom compiler, yes. I'm not sure if it's doable if using messageformat.js directly, in the current version of globalize. It probably is, but it won't be pretty. That ties into your next question.

The existing "live-patch" isn't great, but I want to avoid copying dependencies and modifying them, because this is even harder to maintain over time. We need a better approach... Are there any changes we could propose in their code that would make it easier in our side? Is there any sort of JavaScript patch we could use instead of the bunch of replaces in Gruntfile?

I've used messageformat-parser from npm (it's not available in bower), so I only had to copy and rewrite the compiler and the runtime, which is relatively small, and that allowed me to customize it to globalize's needs. For example, the new messageFormatterRuntimeBind is much better. I think this is a better approach than heavily patching messageformat.js in the Gruntfile. It might be possible to use messageformat.js from npm and use only the compiler (compiler.js), but that's internal, so you'd again be left with something that can change and break at any time, plus some monkey-patching would still be needed. The changes required to messageformat.js to make it usable in globalize would be extensive. They'd have to make it possible to customize the compiler. I doubt they'd want to add that complexity to messageformat.js, when you can just use messageformat-parser and write your own simple compiler.

nkovacs avatar Nov 02 '16 11:11 nkovacs

I only had to copy and rewrite the compiler and the runtime

Could you please show me a diff?

rxaviers avatar Nov 02 '16 11:11 rxaviers

  • https://github.com/nkovacs/globalize/blob/1586e12ff1b7f24a649e442899e76575f6c19b2d/src/message/compiler.js, which is based on https://github.com/messageformat/messageformat.js/blob/master/lib/compiler.js (plus some small parts of https://github.com/messageformat/messageformat.js/blob/master/lib/index.js)

and

  • https://github.com/nkovacs/globalize/blob/1586e12ff1b7f24a649e442899e76575f6c19b2d/src/message/formatter-runtime.js, which is based on https://github.com/messageformat/messageformat.js/blob/master/lib/runtime.js (with unnecessary toString code removed)

nkovacs avatar Nov 02 '16 11:11 nkovacs

Yeap, but looking at a diff from the original compiler and runtime to their rewritten ones would be easier to see what the changes are. Don't worry if you don't a diff handy, I can generate one...

Basically, I'm in line with your suggestion of using a newer messageformat. Although, I want to better understand the changes and impact.

rxaviers avatar Nov 02 '16 11:11 rxaviers

It's a bit hard to see it here because of the whitespace changes required by the coding standard:

compiler.js: https://gist.github.com/nkovacs/8dea134c8af7345c1c7ed921e9dc7aad/revisions

runtime.js: https://gist.github.com/nkovacs/11f320e6ae60b1dccf943768367dab4d/revisions

The first revision is messageformat.js's version indented with 4 spaces (original is 2 spaces), second revision is my version.

nkovacs avatar Nov 02 '16 11:11 nkovacs

I used your gists and created this diff that ignores white space changes:

  • https://gist.github.com/rxaviers/0e57ed723f28800c06e1c07e6c069f1d

rxaviers avatar Nov 02 '16 11:11 rxaviers

@nkovacs how to you suggest we maintain these files? For instance, let's suppose messageformat publish new releases with updates to those files and we want to bring those updates in.

rxaviers avatar Nov 02 '16 11:11 rxaviers

Another question is, what are the challenges and cost of using the new messageformat as is? From your above comments, one of them is "They'd have to make it possible to customize the compiler", what customization would be required please?

rxaviers avatar Nov 02 '16 11:11 rxaviers

The problems I ran into trying to use messageformat 1.0.2:

  • This part here needs changes: https://github.com/jquery/globalize/blob/master/Gruntfile.js#L189 Some of these replacements no longer work, because the code has changed, e.g.: MessageFormat.plurals and MessageFormat._parse. You could fix these, but this patching is hard to maintain.
  • message-runtime has changed to a dynamic object instead of a static one: https://github.com/messageformat/messageformat.js/blob/v1.0.2/messageformat.js#L239. This part won't work: https://github.com/jquery/globalize/blob/master/Gruntfile.js#L237
  • This also won't work: https://github.com/jquery/globalize/blob/master/src/message/formatter-runtime-bind.js#L26, because instead of pluralFuncs.en, it's now just en.

The problems with using messageformat in general (this applies to 0.3.0 as well):

  • messageFormatterRuntimeBind has to use regexp to detect the dependencies of the formatter function: https://github.com/jquery/globalize/blob/master/src/message/formatter-runtime-bind.js#L14 I was able to simplify this by having direct access to the compiler: https://github.com/nkovacs/globalize/blob/563-custom-formatters/src/message/formatter-runtime-bind.js#L13
  • The biggest problem for this feature is this part: https://github.com/messageformat/messageformat.js/blob/v1.0.2/messageformat.js#L113 (similar in 0.3.0: https://github.com/messageformat/messageformat.js/blob/v0.3.0/messageformat.js#L1715). I'd have to use regexps in messageFormatterRuntimeBind to try to figure out which formatters are used, and with what parameters. I changed this part so that it calls the registered formatter factory function, and returns that function to the compiler, which can then bind that: https://github.com/nkovacs/globalize/blob/563-custom-formatters/src/message/compiler.js#L159 (this.formatters is passed as embeddedFormatters to messageFormatterRuntimeBind: https://github.com/nkovacs/globalize/blob/563-custom-formatters/src/message/formatter-runtime-bind.js#L25)

The problem is that messageformat.js compiles {now, date, short} to something like fmt.date(d.now, 'short'), but globalize.js needs the 'short' parameter at compile time to be able to compile an appropriate formatter function.

The minimum change required in messageformat.js would be to return the compiler's runtime property and add the arguments to its formatters object. Globalize's compiler could then generate the appropriate wrappers and an fmt object with a special wrapper fmt.date function, and bind the compiled dateformatter as a dependency.

My version does it slightly differently: {now, date, short} is compiled to fmt[0](d.now), and the wrapper function receives an fmt array, where the 0th element is Globalize.dateFormatter({date: 'short'}).

Here's a complete compiled example:

Globalize.b955419430 = messageFormatterFn((function(plural, fmt, en) {
    return function(d) {
        return "Hello World number( " + plural(d.x, 0, en, {
            one: "one task " + fmt[0](d.now) + " ",
            other: d.x + " tasks " + fmt[1](d.now) + " "
        });
    }
})(messageFormat.plural, [Globalize("en").dateFormatter({
    "date": "long"
}), Globalize("en").dateFormatter({
    "date": "short"
})], Globalize("en").pluralGenerator({
    type: "both"
})), Globalize("en").pluralGenerator({
    "type": "both"
}), Globalize("en").dateFormatter({
    "date": "long"
}), Globalize("en").dateFormatter({
    "date": "short"
}));

and the original message was:

'Hello World number( {x, plural, one {one task {now, date, long} } other {{x} tasks {now, date, short} } }'

I'm not sure why the extra parameters are passed to messageFormatterFn, but I think that's already happening with the current version of globalize.js and pluralGenerator.

nkovacs avatar Nov 02 '16 14:11 nkovacs

Any update on this? I am running into this as well. The messages for me are potentially user defined so formatting the value passed in isn't an option.

jrsearles avatar Aug 20 '19 16:08 jrsearles