Recognizers-Text
Recognizers-Text copied to clipboard
[* Number] Output isn't consistent across cultures
When using the NumberModel across multiple cultures in a single application, it can be quite confusing. For example; in Europe the decimal seperator is a comma (,) instead of a dot (.). This is parsed correctly by the Model, however in the output value
it is also reflected.
It this something which should be changed in Recognizers-Text? Or something that should be added to the documentation to avoid confusion.
> LUIS interprets the variations in user utterances and returns consistent numeric values.
English
{
"Input": "4.800",
"Results": [
{
"Text": "4.800",
"TypeName": "number",
"Resolution": {
"subtype": "decimal",
"value": "4.8"
}
}
]
},
German
{
"Input": "2.000,352",
"NotSupportedByDesign": "python",
"NotSupported": "javascript",
"Results": [
{
"Text": "2.000,352",
"TypeName": "number",
"Resolution": {
"value": "2000,352"
}
}
]
}
Yes, all values should use a consistent format (. as decimal separator). This is a known issue and a fix is on the way. Consumers of the packages need time to adapt to potential breaking changes.
Thanks for following up on this!
!! This has been cancelled as the issue was considered a breaking change !!
While working on the NumberRangeModel for Dutch, I am facing this issue again. If I use the comma as decimal seperator, it messes up with the outcome. example: (0,5,) What would be the suggested approach for the time being? (NumberRangeModel is quite some work still, so no short time PR / release)
"Results": [
{
"Text": "Meer dan de helft",
"TypeName": "numberrange",
"Resolution": {
"value": "(0.5,)"
}
}
],
Message: Assert.AreEqual failed. Expected:<(0.5,)>. Actual:<(0,5,)>. Input: "Meer dan 1/2 van de mensen is aanwezig."
Describe the bug
The Parse
function in
https://github.com/Microsoft/Recognizers-Text/blob/5a9e0c701794df544c2d6b458c5c46374b14ce5b/.NET/Microsoft.Recognizers.Text.Number/Parsers/BaseNumberParser.cs#L857
does not take a culture parameter which breaks the TestNumber_English
--> NumberModel
--> testcase two hundred point seventy-one
testcase on systems where the culture has a decimal separator that is not a dot.
To Reproduce Steps to reproduce the behavior:
- Have a system where the culture is e.g. set to dutch
- Run the
TestNumber_English
-->NumberModel
--> testcasetwo hundred point seventy-one
(set a conditional breakpointtestSpec.Input.Contains(" point seventy")
on line 336 inTestBase.cs
) - Actual result is
271
instead of200.71
because0.71
is parsed as71
Platform (please complete the following information):
- Platform: .Net
- Environment: Source code
- Version of package: commit https://github.com/Microsoft/Recognizers-Text/commit/65d122193cdfd0ac5973a8ffaaf07aecd39960f8
Additional context This seems very related with this existing bug so I decided to add it as a comment rather than a new bug
@hansmbakker, also related to #885.
@tellarin any update regarding this issue? I am facing #885 again. If this issue is not a priority anymore, it would be great if we could reopen #885. I could have a look to make sure the test suite always uses the same settings..
@tellarin, should we address this one?
Issue still pending as changing the default output was considered a breaking change.