icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Improve plural operand handling

Open echeran opened this issue 2 years ago • 12 comments

In case we're not already doing it, we should make sure that when parsing a string representing a number and/or performing PluralRules::select():

  • [ ] we parse both compact notation (1.2c3) and scientific notation (1.2e)
  • [ ] distinguish the c operand from the e operand, to future-proof when those behave differently
  • [ ] make sure we compute plural operands correctly for numbers with exponents (the examples list from the latest spec makes for good test cases)

echeran avatar Sep 23 '21 02:09 echeran

@eggrobin Could you take a look at the bullet points in this issue and determine what needs to be done?

sffc avatar May 25 '22 23:05 sffc

@eggrobin Please take a look at this; thanks!

sffc avatar Jul 28 '22 17:07 sffc

Work to be done here:

  • [ ] Update PluralOperands struct
    • [ ] Create field for scientific notation exponent (just like c, but called e; see UTS 35 on Plural Rules for background)
  • [ ] Update FixedDecimal struct
    • [ ] Add field for compact notation exponent (corresponds to PluralOperands::c)
    • [ ] Add field for scientific exponent (to match field for compact exponent)
    • [ ] Add method to allow adjustment of compact exponent (for when constructing anew; can skip scientific for now)
  • [ ] Update parsing & formatting for FixedDecial
    • Can refer to corresponding ICU PR for parse/format methods for compact notation strings
    • [ ] The FromStr and Display impls in FixedDecimal should support parsing the 1.23c4 compact notation. (Also check on Debug and other trait impls too)
  • [ ] Update FixedDecimal's formatter in icu_decimal
    • [ ] Use formatting symbols compact notation
    • [ ] Update the data struct to make such symbols data available to the formatter
  • [ ] Update PluralRules, etc. functionality
    • [ ] Update conversion to recognize compact and scientific notation exponent fields in impl From<&'_ FixedDecimal> for PluralOperands
    • [ ] Update PluralRules::category_for() method to use compact notation exponent field

echeran avatar Sep 15 '22 17:09 echeran

Discussion:

  • We're not fully sure whether we need both c and e in PluralOperands
  • To punt this problem, make PluralOperands be non_exhaustive and add a function PluralOperands::from_ivwftc() for 1.0. Eat the ergonomics hit for now. We can make it exhaustive again in the future once we are more confident.
  • Punt the rest of this ticket to 1.x

sffc avatar Sep 15 '22 18:09 sffc

PluralOperands::from_ivwftc()

Can we make sure to have this as experimental? This is an ugly API and I'd like to get rid of it ASAP

Alternatively, could we have a struct called IVWFTC that PluralOperands can be From? No other methods needed, the new struct would be marked as experimental.

zbraniecki avatar Sep 15 '22 20:09 zbraniecki

The other option is we just change all call sites to do Default::default() and set the fields they need. Would that be more palatable?

sffc avatar Sep 15 '22 21:09 sffc

The other option is we just change all call sites to do Default::default() and set the fields they need. Would that be more palatable?

let po = PluralOperands::from(IVWFTC {
    i: 2,
    v: 1,
    f: 1,
    t: 2,
    c: 0,
});

vs.

let mut po = PluralOperands::default();
po.i = 2;
po.v = 1;
po.f = 1;
po.t = 2;

vs.

let po = PluralOperands::from_ivwftc(2, 1, 1, 2, 0);

My vote is on the first.

zbraniecki avatar Sep 15 '22 21:09 zbraniecki

The advantage of option 2 is that we don't need any more APIs or experimental features.

Between 1 and 3, I slightly prefer 3 because it doesn't involve importing an extra type and calling the less-discoverable From/Into impls; you just invoke the function.

sffc avatar Sep 15 '22 21:09 sffc

After discussing with @markusicu and @echeran, the point was brought up that these fields should really not generally be things that users of ICU4X should know about or mutate.

I'm therefore heavily inclined to recommend the following:

  1. Make all of the fields of PluralOperands be private (pub(crate))
  2. Add from_ivwftc as a doc(hidden) constructor
  3. Revisit this in the future

sffc avatar Sep 20 '22 18:09 sffc

My vote is 1 > 3 >> 2 for reasons stated above.

zbraniecki avatar Sep 20 '22 18:09 zbraniecki

  • @zbraniecki - It seems useful to allow people to mutate their plural operands.
  • @sffc - Some of the operands, like v, w, f, and t, are not independent. It doesn't make sense to change one without changing the others.
  • @zbraniecki - My intuition is to just make it non_exhaustive.
  • @markusicu - It needs to at least be non_exhaustive. It needs to be able to evolve as the spec evolves. But I've been bitten in the past by making public APIs that were not actually useful. I'd rather users have constructors to make these things according to CLDR rules rather than fiddling with the details. But if there's a use case, it's fine to make it public.
  • @zbraniecki - With the AST and parser being public, we allow people to customize their plurals engine. It's weird if we make some parts public and others not.
  • @markusicu - I didn't know that all the other parts were public. Do they need to be?
  • @zbraniecki - They are doc(hidden). They don't need to be public in ICU4X, but it seems to make sense to have a separate crate with different stability rules. There are customers who may need this.
  • @markusicu - My instinct, again, is to just not make public what users don't need.
  • @echeran - What I put in the PR is a public constructor to generate the plural operands, which gives room for, e.g., asserting invariants. In the case when we evolve the PluralOperands struct to, e.g., say that c and e are different operands, we may want to assert that they aren't nonzero at the same time.
  • @sffc - I've always been a bit concerned with the reference AST being part of our public API
  • @zbraniecki - I want to have a discussion on the long-term solution so that we don't preclude that possibilty in the future.
  • @sffc - This sounds like a good use case for experimental in 1.0

sffc avatar Sep 22 '22 17:09 sffc

What to do next:

  • Make icu_plurals::rules experimental
  • Add a type icu_plurals::rules::RawPluralOperands, which is exhaustive with public fields
  • Add conversions between PluralOperands and RawPluralOperands
  • Update code that needs to build PluralOperands directly to use RawPluralOperands instead
  • Drop the PluralOperands::n() method; perhaps move to RawPluralOperands

LGTM: @sffc @zbraniecki @markusicu @echeran

sffc avatar Sep 22 '22 18:09 sffc