icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22885 Add parsing of approximately sign

Open sffc opened this issue 8 months ago • 5 comments

This adds support for parsing the approximately sign and fixes the bug observed in ICU-22885.

Checklist

  • [x] Required: Issue filed: ICU-22885
  • [x] Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [x] Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [x] Tests included, if applicable
  • [ ] API docs and/or User Guide docs changed or added, if applicable

sffc avatar Mar 26 '25 02:03 sffc

Do we also need a Java fix for this?

FrankYFTang avatar Mar 26 '25 19:03 FrankYFTang

Added Java and fixed the comment.

sffc avatar Mar 26 '25 22:03 sffc

I'm coming to the party late, so I apologize if these are dumb questions, but what are you actually doing here, and why is that the appropriate response to the original issue? If I'm reading this correctly, this makes the number parser explicitly aware of the approximately sign (in all the various locales), but just basically ignores it in parsing. Is that right, and is that what we want to do?

  1. What does this symbol mean in practice? Why would somebody be using it in text that we parse?
  2. I get why calling abort() is a bad idea, but why wouldn't it be better to just signal a parse error?
  3. If ignoring the character is the right thing to do, why do we need code to explicitly identify it as the approximately sign? Couldn't you just have a generic list of characters that should be ignored in parsing?

richgillam avatar Mar 26 '25 22:03 richgillam

The bug uncovered that we didn't handle the approximately sign in parsing, even though it is supported in patters and in formatting, which I added relatively recently (a few years ago).

Treating the approximately sign the same way as the plus sign makes sense to me. With the plus sign, we accept it and it doesn't impact the resulting parsed value. I mostly copied the plus sign code to make the approximately sign code.

sffc avatar Mar 26 '25 22:03 sffc

The bug uncovered that we didn't handle the approximately sign in parsing, even though it is supported in patters and in formatting, which I added relatively recently (a few years ago).

Treating the approximately sign the same way as the plus sign makes sense to me. With the plus sign, we accept it and it doesn't impact the resulting parsed value. I mostly copied the plus sign code to make the approximately sign code.

Okay, I'll accept that. Thanks for the explanation. Given that, the code looks okay to me.

richgillam avatar Mar 26 '25 23:03 richgillam

@sffc please take a look at the feedback from @FrankYFTang

markusicu avatar Sep 23 '25 00:09 markusicu

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

I used the tool to squash the branch because @FrankYFTang's comments are stylistic only. It would take me more time to find the branch and fix the style. I might try to get to it this week but I'm currently overwhelmed with urgent items and I want to make this mergeable for ICU 78 RC.

sffc avatar Sep 23 '25 03:09 sffc

Notice: the branch changed across the force-push!

  • icu4c/source/common/static_unicode_sets.cpp is different
  • icu4c/source/i18n/numparse_symbols.cpp is different
  • icu4c/source/test/intltest/numfmtst.cpp is different
  • icu4j/main/core/src/main/java/com/ibm/icu/impl/StaticUnicodeSets.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Let's not have this be stuck for a second release just on line wrapping... I have amended the commit to address @FrankYFTang 's feedback. @richgillam had approved this already in March, minus the new line wrapping. @sffc ok with my changes? I will also self-approve, and try to remember to merge by EOD if I don't hear otherwise.

markusicu avatar Sep 26 '25 17:09 markusicu

Sure

sffc avatar Sep 26 '25 19:09 sffc