icu
icu copied to clipboard
ICU-22885 Add parsing of approximately sign
This adds support for parsing the approximately sign and fixes the bug observed in ICU-22885.
Checklist
- [x] Required: Issue filed: ICU-22885
- [x] Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
- [x] Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
- [x] Issue accepted (done by Technical Committee after discussion)
- [x] Tests included, if applicable
- [ ] API docs and/or User Guide docs changed or added, if applicable
Do we also need a Java fix for this?
Added Java and fixed the comment.
I'm coming to the party late, so I apologize if these are dumb questions, but what are you actually doing here, and why is that the appropriate response to the original issue? If I'm reading this correctly, this makes the number parser explicitly aware of the approximately sign (in all the various locales), but just basically ignores it in parsing. Is that right, and is that what we want to do?
- What does this symbol mean in practice? Why would somebody be using it in text that we parse?
- I get why calling
abort()is a bad idea, but why wouldn't it be better to just signal a parse error? - If ignoring the character is the right thing to do, why do we need code to explicitly identify it as the approximately sign? Couldn't you just have a generic list of characters that should be ignored in parsing?
The bug uncovered that we didn't handle the approximately sign in parsing, even though it is supported in patters and in formatting, which I added relatively recently (a few years ago).
Treating the approximately sign the same way as the plus sign makes sense to me. With the plus sign, we accept it and it doesn't impact the resulting parsed value. I mostly copied the plus sign code to make the approximately sign code.
The bug uncovered that we didn't handle the approximately sign in parsing, even though it is supported in patters and in formatting, which I added relatively recently (a few years ago).
Treating the approximately sign the same way as the plus sign makes sense to me. With the plus sign, we accept it and it doesn't impact the resulting parsed value. I mostly copied the plus sign code to make the approximately sign code.
Okay, I'll accept that. Thanks for the explanation. Given that, the code looks okay to me.
@sffc please take a look at the feedback from @FrankYFTang
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot
I used the tool to squash the branch because @FrankYFTang's comments are stylistic only. It would take me more time to find the branch and fix the style. I might try to get to it this week but I'm currently overwhelmed with urgent items and I want to make this mergeable for ICU 78 RC.
Notice: the branch changed across the force-push!
- icu4c/source/common/static_unicode_sets.cpp is different
- icu4c/source/i18n/numparse_symbols.cpp is different
- icu4c/source/test/intltest/numfmtst.cpp is different
- icu4j/main/core/src/main/java/com/ibm/icu/impl/StaticUnicodeSets.java is different
~ Your Friendly Jira-GitHub PR Checker Bot
Let's not have this be stuck for a second release just on line wrapping... I have amended the commit to address @FrankYFTang 's feedback. @richgillam had approved this already in March, minus the new line wrapping. @sffc ok with my changes? I will also self-approve, and try to remember to merge by EOD if I don't hear otherwise.
Sure