message-format-wg
message-format-wg copied to clipboard
Design document for percent formatting
This document now includes the proposed design discussed in the (poorly documented, sparsely attended) 2025-05-19 call. The emerging consensus appears to be:
:number/:integerwith optionstyle=percent(this is the current design) with scaling:unitwith unitpercentis REQUIRED but other units are not required, this function has NO scaling
The most recent commit posits that :number/:integer select after scaling because we don't support fraction selection currently.
[!NOTE] All previous conversations were marked resolved on purpose and not because their content was in any way deficient, off topic, or necessarily addressed. Please comment on the proposed design.
I agree that unless there is a way to distinguish the source and target unit, it wouldn't scale. I was just pointing that out for people unfamiliar with the way that units work.
(And I'm not a fan of using :unit unit=percent at all to solve the problem of producing "10%" from 0.1, for a number of reasons).
On Mon, Apr 21, 2025 at 4:22 PM Addison Phillips @.***> wrote:
@.**** commented on this pull request.
In exploration/percent-format.md https://github.com/unicode-org/message-format-wg/pull/1068#discussion_r2053099527 :
+- Allow
unit=percentin:unitthat is identical to:percentin formatting performance,
- for compatibility with CLDR units,
- but document that this usage is not preferred.
I understood that to be the case.
:unit can override the unit, in which case scaling occurs. The question is what happens when there is no other unit? Using MeasureFormat in ICU4J can only be an approximation, since the only way to call it is with a Measure object. Presumably a bare number operand in MF would, behind the scenes, be packaged with the unit.
I'm not suggesting that :unit does not convert. Only that the default behavior of unit=percent is unscaled given a numeric operand. This is different from MF1's handling of operand,number,percent formatting and the proposed performance of :percent. Do you disagree?
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/pull/1068#discussion_r2053099527, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMF53JDQ2ZTOSQ3WQ7T22V4U5AVCNFSM6AAAAAB2USHKLKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOOBSGQYDIMBUGE . You are receiving this because you were assigned.Message ID: @.***>
I just realised that this whole discussion is also related to https://github.com/unicode-org/message-format-wg/pull/1015#pullrequestreview-2621386507, which we probably ought to address as well.
In other words, as we currently don't have :number notation, we probably ought to figure out how we're going to do its style of scaling as well.
(chair hat OFF)
Adding style=percent to :number would not prevent us from having a convenience function :percent later (or now). This suggests that we prefer a "lumping" vs. "splitting" design for functions, with at least one function with "all the options", possibly surrounded by a host of convenience functions. This is to some degree what we have now, e.g.:
:numberwith:integerand possibly:percentand other later functions as convenience:datetimewith:date,:time(and as many as 16 more) functions as convenience
I'm not opposed to adding convenience functions. I think :integer was exactly the right thing to do for our users.
Alternatively we could regard :percent as a separate function handler (including with the special selection logic related to scaling) with targeted options. This suggests that we might add other specialized functions in the future, although it doesn't require it. We also have this as a model:
:numberhas:currencyas a (draft) friend.:numbercannot format currency values (it can format a number from a currency, but not as a currency)
I think :unit unit=percent is a red herring. It exists because carving out percent from CLDR units is janky (especially given that per-mille and per-myriad exist). My guess is that no one will use it when there's a big shiny function :percent available. Having two baroque means ({$p :number style=percent} and {$p :unit unit=percent}) for formatting percentages is just weird: we have a common-enough use case and two different, equally-inconvenient means of formatting it?
My preferred solution is:
- add
:percent - do NOT add
:number style=percent - do NOT require
:unit unit=percent(but permit it)
(chair hat ON)
I observe that we're not closing on a design. If we do not achieve consensus on a design in the next (2025-06-02) call, I will call for a ballot.
I talked with @sffc while we met in person this past week, and it came up that ICU4X would almost certainly want unit formatting to be separated by category. As in, when formatting a value that includes its own unit (say, kilometer), the expression would need to define at least length as the category of supported units. The rationale here is to limit the data loading that would be required for the formatter before it can tell exactly which unit it'll be formatting.
If that is a requirement that we accept, then it suggests to me that we ought to include the category in the function name, so we'd have e.g. :unit:length, :unit:volume and so on, rather than a catch-all :unit. With such an approach, we ought to consider a dedicated :unit:percent, and if so, promote its use rather than adding a style=percent on :number.
@eemeli suggested:
If that is a requirement that we accept, then it suggests to me that we ought to include the category in the function name, so we'd have e.g. :unit:length, :unit:volume and so on, rather than a catch-all :unit. With such an approach, we ought to consider a dedicated :unit:percent, and if so, promote its use rather than adding a style=percent on :number.
So :unit would be a namespace? We don't permit nested namespaces, so that would limit implementation-specific extension. Perhaps use an unreserved sigil as a separator instead, e.g. :unit-length, :unit-volume, :unit-percent. Only, once we do that, the unit- part starts to look superfluous. What's the difference between :unit-percent and :percent? Similarly, why type :unit-length instead of :length?
The "requirement" is really a dodge around creating separate functions for each unit or around creating a single function whose data loading depends on an option (or on the operand value). Most implementations bind the unit data late, but we should allow for ICU4X and its need/desire to bind the data early.
I disagree completely.
By "category" I assume what is meant is "quantity" (from SI, with a few other special cases). See https://www.unicode.org/cldr/charts/48/supplemental/unit_conversions.html
There are downsides of the mentioned approach (unit formatting to be separated by category)
- There are many quantities: the message writer has the burden of looking up the exact quantity being formatted, as well as the unit.
- There are many possible units that don't have an SI quality. For example, farad-per-square-second.
- The contents of each "chunk" that a memory-constrained implementation (like ICU4X) needs for it to minimize loading may well not be aligned with quantities — so best left to that implementation.
- For any particular unit being formatted, something like a quantity is fairly straightforward to look up, with a small amount of data.
So I don't think it is at all justified to jump through hoops by requiring categories.
Note: the heading at the top of the chart needs some tweaks. For example, it doesn't mention beaufort, which has a more complex conversion than just factor & offset.
The current design of :unit is not implementable with ICU4X's data design for reasons laid out in https://github.com/unicode-org/message-format-wg/issues/1006. We did not take too close of a look at :unit when writing that doc because :unit was not marked as being required, but it suffers from many of the same problems as, for example, u:locale. As a result, requiring "part" of :unit for percent formatting is not feasible.
@eemeli's suggestion of :unit:percent or :unit:length would mitigate this problem.
There are many quantities, but I would be happy enough splitting out the most important ones and throwing everything else into :unit:other or something.
The current design of :unit is not implementable with ICU4X's data design for reasons laid out in https://github.com/unicode-org/message-format-wg/issues/1006.
I looked at #1006 and didn't find a discussion of unit. I am strongly opposed to requiring quantities with unit ids.
What might work is for unit is to only require a small subset of unit ids to be supported. Then ICU4X and other memory-limited implementations could choose to only support the required set. But that is really a separate issue.
The current design of :unit is not implementable with ICU4X's data design for reasons laid out in #1006.
I looked at #1006 and didn't find a discussion of unit.
We did not take too close of a look at :unit when writing that doc because :unit was not marked as being required, but it suffers from many of the same problems as, for example, u:locale.
I'm spinning off the :unit discussion into #1079