Recognizers-Text [* Number] Output isn't consistent across cultures

[* Number] Output isn't consistent across cultures

Open iMicknl opened this issue 5 years ago • 7 comments

When using the NumberModel across multiple cultures in a single application, it can be quite confusing. For example; in Europe the decimal seperator is a comma (,) instead of a dot (.). This is parsed correctly by the Model, however in the output value it is also reflected.

It this something which should be changed in Recognizers-Text? Or something that should be added to the documentation to avoid confusion.

> LUIS interprets the variations in user utterances and returns consistent numeric values.

English

 {
    "Input": "4.800",
    "Results": [
      {
        "Text": "4.800",
        "TypeName": "number",
        "Resolution": {
          "subtype": "decimal",
          "value": "4.8"
        }
      }
    ]
  },

German

{
    "Input": "2.000,352",
    "NotSupportedByDesign": "python",
    "NotSupported": "javascript",
    "Results": [
      {
        "Text": "2.000,352",
        "TypeName": "number",
        "Resolution": {
          "value": "2000,352"
        }
      }
    ]
  }

Aug 07 '18 21:08 iMicknl

Yes, all values should use a consistent format (. as decimal separator). This is a known issue ~~and a fix is on the way. Consumers of the packages need time to adapt to potential breaking changes.~~ Thanks for following up on this!

!! This has been cancelled as the issue was considered a breaking change !!

Aug 08 '18 03:08 tellarin

While working on the NumberRangeModel for Dutch, I am facing this issue again. If I use the comma as decimal seperator, it messes up with the outcome. example: (0,5,) What would be the suggested approach for the time being? (NumberRangeModel is quite some work still, so no short time PR / release)

    "Results": [
      {
        "Text": "Meer dan de helft",
        "TypeName": "numberrange",
        "Resolution": {
          "value": "(0.5,)"
        }
      }
    ],

Message: Assert.AreEqual failed. Expected:<(0.5,)>. Actual:<(0,5,)>. Input: "Meer dan 1/2 van de mensen is aanwezig."

Sep 10 '18 21:09 iMicknl

Describe the bug The Parse function in https://github.com/Microsoft/Recognizers-Text/blob/5a9e0c701794df544c2d6b458c5c46374b14ce5b/.NET/Microsoft.Recognizers.Text.Number/Parsers/BaseNumberParser.cs#L857

does not take a culture parameter which breaks the TestNumber_English --> NumberModel --> testcase two hundred point seventy-one testcase on systems where the culture has a decimal separator that is not a dot.

To Reproduce Steps to reproduce the behavior:

Have a system where the culture is e.g. set to dutch
Run the TestNumber_English --> NumberModel --> testcase two hundred point seventy-one (set a conditional breakpoint testSpec.Input.Contains(" point seventy") on line 336 in TestBase.cs)
Actual result is 271 instead of 200.71 because 0.71 is parsed as 71

Platform (please complete the following information):

Platform: .Net
Environment: Source code
Version of package: commit https://github.com/Microsoft/Recognizers-Text/commit/65d122193cdfd0ac5973a8ffaaf07aecd39960f8

Additional context This seems very related with this existing bug so I decided to add it as a comment rather than a new bug

Mar 28 '19 21:03 hansmbakker

@hansmbakker, also related to #885.

Mar 28 '19 22:03 iMicknl

@tellarin any update regarding this issue? I am facing #885 again. If this issue is not a priority anymore, it would be great if we could reopen #885. I could have a look to make sure the test suite always uses the same settings..

Jul 17 '20 13:07 iMicknl

@tellarin, should we address this one?

Nov 03 '20 05:11 aitelint

Issue still pending as changing the default output was considered a breaking change.

Apr 30 '21 07:04 tellarin

Recognizers-Text Recognizers-Text copied to clipboard

[* Number] Output isn't consistent across cultures

Recognizers-Text
Recognizers-Text copied to clipboard