lingua-franca icon indicating copy to clipboard operation
lingua-franca copied to clipboard

Number parsers unreliable in presence of "and" words

Open JuneStepp opened this issue 5 years ago • 2 comments

"Nine hundred and five" only returns "900". Fractions work fine though like "nine hundred and two tenths" which returns "900.2".

JuneStepp avatar Feb 23 '20 00:02 JuneStepp

Nice find. It looks like we're having the opposite problem in Spanish. I'm gonna use this issue to document all the related bugs, so we can write failing tests for the lot of them at once.

>>> extract_number("novecientos y cinco", lang="es")
905
>>> extract_number("novecientos cinco", lang="es")
5

maintainers: please check for similar bugs in your native languages! i only speak the two.


further diagnosis:

>>> extract_numbers("novecientos veinte y cinco", lang="es")
[905] # should be 925. consistent behavior with other bug would return [900, 25].

this snippet is a possible dupe or cousin of #86

ChanceNCounter avatar Feb 23 '20 02:02 ChanceNCounter

I'll have to triple-check, but I think this boils down to we don't handle that (grammatically incorrect, but colloquially constant) use of "and" yet. I vote willfix, just explaining.

Fractions work because that's the only thing "and" triggers at the moment. However, "two hundred and five" doesn't have a denominator, so the "five" is discarded as a separate number, which would be returned along with 900 in extract_numbers.

ChanceNCounter avatar Mar 10 '20 20:03 ChanceNCounter