icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-20512 With currency formatter and empty currency symbol, parseDouble should still work (ICU4C part)

Open pedberg-icu opened this issue 2 years ago • 2 comments

Checklist
  • [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-20512
  • [x] Required: The PR title must be prefixed with a JIRA Issue number.
  • [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • [x] Required: Each commit message must be prefixed with a JIRA Issue number.
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [x] Tests included, if applicable
  • [ ] API docs and/or User Guide docs changed or added, if applicable

Note, this is now targeted to ICU 72.

With a currency formatter set to an empty currency symbol, parseDouble should still work.

Make sure this also handles patterns with currency symbol at end (regardless of whether the symbol is separated by space).

This PR has just the ICU4C changes (based on Apple ICU); if they seem OK (possibly with slight changes), I will make a separate PR for the ICU4J changes.

pedberg-icu avatar Mar 09 '22 08:03 pedberg-icu

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

waiting-on-reviewer @sffc ?

markusicu avatar Mar 16 '22 15:03 markusicu

Did you try an approach where you don't add the matcher for currency if the currency symbol is empty? Then you don't need to poke all these holes in the parsing code. It would be nice if we could fix this in the parser setup.

@sffc I tried doing just the following in NumberParserImpl::createParserFromProperties, but it did not fix the problem:

@@ -140,7 +140,9 @@ NumberParserImpl::createParserFromProperties(const number::impl::DecimalFormatPr
     /// CURRENCY MATCHER ///
     ////////////////////////
 
-    if (parseCurrency || affixProvider.get().hasCurrencySign()) {
+    bool parseCurrencySymbolForDecimalFormat = affixProvider.get().hasCurrencySign() &&
+                                               !symbols.getConstSymbol(DecimalFormatSymbols::kCurrencySymbol).isEmpty();
+    if (parseCurrency || parseCurrencySymbolForDecimalFormat) {
         parser->addMatcher(parser->fLocalMatchers.currency = {currencySymbols, symbols, parseFlags, status});
     }
 

Is that what you meant by not loading the currency matcher?

You might need a post-processing step to set the currency code if the rest of the string matched.

I think parseCurrency should continue to fail if there is no currency symbol to parse, so I do not think I need to set any currency code.

pedberg-icu avatar Sep 15 '22 23:09 pedberg-icu

Yeah, that looks like the right direction.

It's actually really easy to see what's going on in the number parsing code by stepping through it in the debugger. If I apply your change and run your unit test, the thing that is failing the parse is the RequireAffixValidator, which is added due to strict mode.

Also, the currency is getting matched inside an affix matcher. The top-level one is basically only used in lenient mode.

I opened #2187 with a solution that passes your test. It adds extra affix matchers that skip the currency matcher if there is an empty currency symbol in DecimalFormatSymbols.

sffc avatar Sep 16 '22 07:09 sffc

It's actually really easy to see what's going on in the number parsing code by stepping through it in the debugger. If I apply your change and run your unit test, the thing that is failing the parse is the RequireAffixValidator, which is added due to strict mode.

Yeah, I had not yet had time to investigate why the second approach was not working. But even so I would not have come up with the much cleaner solution that you have in #2187.

Also, the currency is getting matched inside an affix matcher. The top-level one is basically only used in lenient mode.

I opened #2187 with a solution that passes your test. It adds extra affix matchers that skip the currency matcher if there is an empty currency symbol in DecimalFormatSymbols.

Excellent, many thanks! I will close this PR in favor of that one (which I have approved and merged).

pedberg-icu avatar Sep 16 '22 16:09 pedberg-icu