Recognizers-Text icon indicating copy to clipboard operation
Recognizers-Text copied to clipboard

[* Number] Output isn't consistent across cultures

Open iMicknl opened this issue 5 years ago • 7 comments

When using the NumberModel across multiple cultures in a single application, it can be quite confusing. For example; in Europe the decimal seperator is a comma (,) instead of a dot (.). This is parsed correctly by the Model, however in the output value it is also reflected.

It this something which should be changed in Recognizers-Text? Or something that should be added to the documentation to avoid confusion.

> LUIS interprets the variations in user utterances and returns consistent numeric values.

English

 {
    "Input": "4.800",
    "Results": [
      {
        "Text": "4.800",
        "TypeName": "number",
        "Resolution": {
          "subtype": "decimal",
          "value": "4.8"
        }
      }
    ]
  },

German

{
    "Input": "2.000,352",
    "NotSupportedByDesign": "python",
    "NotSupported": "javascript",
    "Results": [
      {
        "Text": "2.000,352",
        "TypeName": "number",
        "Resolution": {
          "value": "2000,352"
        }
      }
    ]
  }

iMicknl avatar Aug 07 '18 21:08 iMicknl

Yes, all values should use a consistent format (. as decimal separator). This is a known issue and a fix is on the way. Consumers of the packages need time to adapt to potential breaking changes. Thanks for following up on this!

!! This has been cancelled as the issue was considered a breaking change !!

tellarin avatar Aug 08 '18 03:08 tellarin

While working on the NumberRangeModel for Dutch, I am facing this issue again. If I use the comma as decimal seperator, it messes up with the outcome. example: (0,5,) What would be the suggested approach for the time being? (NumberRangeModel is quite some work still, so no short time PR / release)

    "Results": [
      {
        "Text": "Meer dan de helft",
        "TypeName": "numberrange",
        "Resolution": {
          "value": "(0.5,)"
        }
      }
    ],

Message: Assert.AreEqual failed. Expected:<(0.5,)>. Actual:<(0,5,)>. Input: "Meer dan 1/2 van de mensen is aanwezig."

iMicknl avatar Sep 10 '18 21:09 iMicknl

Describe the bug The Parse function in https://github.com/Microsoft/Recognizers-Text/blob/5a9e0c701794df544c2d6b458c5c46374b14ce5b/.NET/Microsoft.Recognizers.Text.Number/Parsers/BaseNumberParser.cs#L857

does not take a culture parameter which breaks the TestNumber_English --> NumberModel --> testcase two hundred point seventy-one testcase on systems where the culture has a decimal separator that is not a dot.

To Reproduce Steps to reproduce the behavior:

  1. Have a system where the culture is e.g. set to dutch
  2. Run the TestNumber_English --> NumberModel --> testcase two hundred point seventy-one (set a conditional breakpoint testSpec.Input.Contains(" point seventy") on line 336 in TestBase.cs)
  3. Actual result is 271 instead of 200.71 because 0.71 is parsed as 71

Platform (please complete the following information):

  • Platform: .Net
  • Environment: Source code
  • Version of package: commit https://github.com/Microsoft/Recognizers-Text/commit/65d122193cdfd0ac5973a8ffaaf07aecd39960f8

Additional context This seems very related with this existing bug so I decided to add it as a comment rather than a new bug

hansmbakker avatar Mar 28 '19 21:03 hansmbakker

@hansmbakker, also related to #885.

iMicknl avatar Mar 28 '19 22:03 iMicknl

@tellarin any update regarding this issue? I am facing #885 again. If this issue is not a priority anymore, it would be great if we could reopen #885. I could have a look to make sure the test suite always uses the same settings..

iMicknl avatar Jul 17 '20 13:07 iMicknl

@tellarin, should we address this one?

aitelint avatar Nov 03 '20 05:11 aitelint

Issue still pending as changing the default output was considered a breaking change.

tellarin avatar Apr 30 '21 07:04 tellarin