icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22284 dump Numeric_Value property in icuexportdata.cpp

Open m4rch3n1ng opened this issue 2 months ago • 3 comments

currently, there is no way to get the numeric value of a character from the icuexportdata, so this exports the values into a nv.toml file. this is a value, that icu4x would like to be able to provide (https://github.com/unicode-org/icu4x/issues/3014).

~~for the new nv.toml export, i added a new type of property, a [[value_property]]. a value property is similar to an [[enum_property]], but it doesn't have the values key for the enum variants and it doesn't have a name field for each of the range maps.~~ similar to the bmg.toml, this exports a [[enum_property]], but without the values and without the name field in each of the ranges.

i was a little unsure, of what value to export, as there were two options: exporting it as a double or exporting the raw numeric type value (via GET_NUMERIC_TYPE_VALUE(u_getMainProperties(c))). i have decided on the second, both for being smaller (a double vs an int32_t) and for being more accurate (floating point numbers cannot accurately represent some fractions and the highest number that unicode provides is higher than the max safe integer of a double). it is also more flexible, potentially allowing languages with native support for fractions to actually consume them as fractions. this does put the burden of reinterpreting the value again on the consumer side, but i think, that is a fine tradeoff.

i have also made a icu4x branch, where i provide this new property: https://github.com/m4rch3n1ng/icu4x/tree/numeric-value. you can also see how the new nv.toml file looks like there.

Checklist

  • [x] Required: Issue filed: ICU-22284
  • [x] Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [x] Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [ ] Tests included, if applicable
  • [ ] API docs and/or User Guide docs changed or added, if applicable

m4rch3n1ng avatar Oct 27 '25 16:10 m4rch3n1ng

Notice: the branch changed across the force-push!

  • icu4c/source/tools/icuexportdata/icuexportdata.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

i noticed (a little late), that what i was doing here previously was essentially just what bmg already does, but using a new [[value_property]], while the bmg.toml is just a "normal" [[enumerated_property]], so i switched to do that too.

m4rch3n1ng avatar Oct 28 '25 14:10 m4rch3n1ng

@sffc @robertbastian @hsivonen does this look like what you would want for ICU4X?

markusicu avatar Oct 30 '25 16:10 markusicu