fmt icon indicating copy to clipboard operation
fmt copied to clipboard

FR: Helper for pluralizing

Open WardBrian opened this issue 5 months ago • 18 comments
trafficstars

I often find myself writing something like

pf ppf " (excluding the %d argument%s)" n_skipped (if n_skipped = 1 then "" else "s")

I think a helper function for this "maybe-add-an-S" would be very useful.

One possible signature is

val plural: int -> 'a t -> 'a t
(** [plural n pp] is [pp] if [s <= 1], [append pp (Fmt.any "s")] otherwise

The usage for the which could look like pf ppf " (excluding the %d %a)" n_skipped (plural n_skipped string) "argument"

But I can imagine others may be useful, including one that is something like val pluralize: int t -> int t

which would look like pf ppf " (excluding the %a)" (pluralize (fun ppf n -> pf ppf "%d argument")) n_skipped

If either of these -- or some variant that hasn't occurred to me -- would be considered, I'd be happy to draw up a PR

WardBrian avatar May 28 '25 21:05 WardBrian

Another option -- perhaps too specific for this library -- would be a function like

val labeled_count: string -> int t
(** Returns an [int t] that produces strings such as "1 LABEL", "2 LABELs" *)

WardBrian avatar May 28 '25 21:05 WardBrian

I often find myself writing something like

You are not alone.

One problem is that often you actually want to write something specific for 0, 1 and perhaps even 2. Another one is irregular plurals (like [wo]man, [wo]men). So in the end I'm never really sure a specific combinator is much clearer than a match on the count where you specify precisely what you want.

For these reasons I, up to now, refrained adding this kind of things because that quickly veers towards a message localisation system – which in addition to handling different languages also manages the inflection problem – and thought that perhaps this should be left to them.

dbuenzli avatar May 29 '25 13:05 dbuenzli

Fair enough -- my use case (error messages, mostly) tends to avoid any irregular words, but it is worth worrying about in generic code. The one_of function was actually what convinced me that something like this could belong in Fmt, but I didn't look at the implementation too closely...

Would the same type of function but called trailing_s or something less general than plural be more clear?

WardBrian avatar May 29 '25 14:05 WardBrian

This is what I have golfed it down to. It seems almost unnecessary for inclusion at this point, but I still think it would be nice to provide "out of the box"

  let trailing_s n pp = Fmt.(pp ++ if' (n <> 1) (any "s"))

WardBrian avatar May 29 '25 18:05 WardBrian

Should it be 0 item or 0 items ?

dbuenzli avatar May 31 '25 09:05 dbuenzli

I think this combinator would only be useful with so-called “countable” nouns (e.g “zero apples”), which are pluralized. It’s true for some words you don’t add an s (e.g. “zero tolerance”), but in those contexts you don’t generally put any other number there or an s anyway (it’s nonsense to say “3 tolerances”, unless you’re referring to something like the total number of manufacturing tolerances in a process, in which case you would actually add the s for 0 of them)

WardBrian avatar May 31 '25 12:05 WardBrian

So I was looking at the CLDR language plural rules for english. For cardinal numbers there are, as you suggested, only two categories. One for 1 and another one for zero and all the other numbers.

I also read the wikipedia page about english plurals and I find there are too many exceptions for that adds an s combinator to be a viable solution. Even a more subtle algorithmic approach is doomed to fail.

I think we should follow what localisation APIs do here (e.g. JavaScript's Intl.PluralRules). This API simply take the cardinal number and returns you the corresponding category of the CLDR for the selected language. You can then map the category (and for english that's only one or other) to the data you want. For example in JavaScript this could be:

{ one: "child"
  other: "children" }

So I think the best is to simply reuse their terminology and hardcode the cases for english.

Tentatively this could be:

val Fmt.en_cardinal : one:string -> other:string -> int Fmt.t
(** [en_cardinal ~one ~other] formats [1] as the string [one] and 
    zero and other numbers as the string [other]. *)

(one and other could also be unit Fmt.t but I think it will be clearer with direct strings)

For example:

let pp_children ppf count = 
  let children = Fmt.en_cardinal ~one:"child" ~more:"children" in
  Fmt.pf ppf "There are %d %a." count children count

What do you think ?

dbuenzli avatar May 31 '25 22:05 dbuenzli

That seems reasonable! Maybe it’s over-thinking things, but I almost want ~more to be ?more with a default value that is equivalent to one ^ “s”.

I think the only way to do this would require wrapping with Some when you do want to really provide it, or by e.g. imbuing the empty string with that special meaning, though, so maybe it’s not worth it. Unless OCaml function defaults can depend on other arguments and I’ve somehow missed this fact

WardBrian avatar May 31 '25 23:05 WardBrian

This is doable, you just need to add a unit argument:

val Fmt.en_cardinal : one:string -> ?other:string -> unit -> int Fmt.t

dbuenzli avatar May 31 '25 23:05 dbuenzli

I don’t mind the extra unit argument, it seems like a nice trade off for not needing to type double in the typical case

WardBrian avatar May 31 '25 23:05 WardBrian

For example:

let pp_children ppf count = 
  let children = Fmt.en_cardinal ~one:"child" ~more:"children" in
  Fmt.pf ppf "There are %d %a." count children count

Not a very good example.

Perhaps we should also still add:

val Fmt.en_cardinal' : one:int Fmt.t -> other:int Fmt.t -> int Fmt.t

For easily pluralizing sentences:

let pp_children =
  let one ppf _ = Fmt.string ppf "There is one child." in 
  let other ppf n = Fmt.pf ppf "There are %d children." n in
  Fmt.en_cardinal' ~one ~other

dbuenzli avatar May 31 '25 23:05 dbuenzli

I’m not sure “one” would ever use the int argument, so it could be a unit t (or even just a string? the interesting formatting will almost certainly be in other in that style). But I agree that it would be nice to have an alternate signature to avoid passing the same int twice

This gets close to another feature request I was considering, which is for ordinals (1st, 2nd, 3rd, nth). These sometimes also lead to awkward call structures in my formatters

WardBrian avatar Jun 01 '25 01:06 WardBrian

It’s less general, but I personally would go for this

val Fmt.en_cardinal : one:string -> ?other:string -> unit -> int Fmt.t

And then a second function, Fmt.counted_en_cardinal, which handles the common case of ”%d %a” n (en_cardinal …) n The space could even be controlled by ~sep such that you could omit or customize it.

Does that sound reasonable? Some people may still prefer to use the spelled out names for “one” or other small n, but I don’t know if that can be designed around

WardBrian avatar Jun 01 '25 01:06 WardBrian

I’m not sure “one” would ever use the int argument,

Perhaps, but I would still keep it for the following reasons:

  1. It facilitates stuff if you have specific rendering functions (e.g. styled) for rendering cardinals.
  2. If one were to introduce the same functionality for other languages, more than one cardinal could be mapped to that case (e.g. in french) and I'd rather have the same signature for all *_cardinal functions.

But I agree that it would be nice to have an alternate signature to avoid passing the same int twice

For me that's not the really issue. The issue is that in general the whole structure of the sentence changes. Here is a concrete example which I don't find very readable. So I'm not very convinced by your suggestion of Fmt.counted_en_cardinal, it's too specific and inflexible.

This gets close to another feature request I was considering, which is for ordinals (1st, 2nd, 3rd, nth). These sometimes also lead to awkward call structures in my formatters

That would just be a matter of hardcoding the rules for ordinals something like:

val Fmt.en_ordinal : 
  one:int Fmt.t -> two:int Fmt.t -> few:int Fmt.t -> 
  other:int Fmt.t -> int Fmt.t

dbuenzli avatar Jun 01 '25 09:06 dbuenzli

I'm going to try to experiment a bit with these two functions (see this commit in b0).

A few comments:

  1. I think we can avoid the explicit en, in the name. Fmt has no notion of locale and there are already error message builders that use english.

  2. So far I avoided adding a specific combinator for strings. Fmt.any allows to quickly plug constant strings.

  3. int formatters are used consistently. In particular using such a formatter for one allows to define a default for other.

  4. For cardinal an explicit zero case which defaults to other is added. You often want something else (in fact the LDML description about plural suggests you may want to add that to a potential pluralizing API.)

A few example:

let pp_children = 
  let zero = Fmt.any "no children" in
  let one = Fmt.any "child" in
  let other = Fmt.any "children" in
  Fmt.cardinal ~zero ~one ~other () 

let pp_label_count = 
  let one ppf n = Fmt.pf ppf "%d label" n in 
  Fmt.cardinal ~one ()

let pp_error_count = 
  let zero = Fmt.any "no error" in
  let one ppf n = Fmt.pf ppf "%d error" n in
  Fmt.cardinal ~zero ~one () 

Tell me what you think.

dbuenzli avatar Jun 01 '25 21:06 dbuenzli

Those both look very sleek, I'd be happy to have them as a user :)

WardBrian avatar Jun 03 '25 13:06 WardBrian

One more thing I never had need for ordinals but you seemed interested in them. Should we rather make all the arguments of Fmt.ordinal optional and default them with:

let one ppf n = Fmt.pf ppf "%dst" n
let two ppf n = Fmt.pf ppf "%dnd" n
let three ppf n = Fmt.pf ppf "%drd" n
let other ppf n = Fmt.pf ppf "%dth" n

dbuenzli avatar Jun 03 '25 14:06 dbuenzli

Oh, I like that idea. I would basically never need non-default options in that case.

WardBrian avatar Jun 03 '25 14:06 WardBrian

Thanks for finally forcing me to look for a proper solution :-)

dbuenzli avatar Jun 04 '25 22:06 dbuenzli

Thank you for finding one! Looking forward to deleting some code once these are released.

WardBrian avatar Jun 05 '25 13:06 WardBrian

Not to be a bother, but do you have a sense of when you think a fmt 0.11 which includes these new functions will be released?

WardBrian avatar Jun 27 '25 18:06 WardBrian

Not really, but I can definitively do a release with just that. Ping me again if nothing came out by the end of july.

dbuenzli avatar Jun 27 '25 18:06 dbuenzli