Numsense icon indicating copy to clipboard operation
Numsense copied to clipboard

tryParseEnglish "twentytwenty" returns 40

Open ploeh opened this issue 9 years ago • 5 comments

tryParseEnglish "twentytwenty" returns Some 40, which is surprising to say the least. It was never an explicit test case, though, but is a fairly standard idiom in the language, particularly when referring to years:

  • nineteen eighty-four (1984)
  • twenty-sixteen (2016)
  • fourteen fifty-three (1453)

There are two potential ways to address such numbers:

  1. If they are unambiguous, a better result from tryParseEnglish "twentytwenty" would be Some 2020. While I suspect that they are unambiguous, anyone can, and is welcome to, prove me wrong with only a single counter-example.
  2. If such numerals are ambiguous, the correct return value would be None.

ploeh avatar Feb 03 '16 12:02 ploeh

Similar to #29

ploeh avatar Feb 03 '16 12:02 ploeh

Is that really a bug, we're not parsing years, we're parsing numbers. There are a lot of other shorthands for particular years, say the "sixties", or the Chinese years, that we're not parsing either, my point is years are not pure numbers.

ncave avatar Feb 07 '16 09:02 ncave

we're parsing numbers.

Agreed. These numbers may be years, but they may also be other types of numbers.

According to that argument, tryParseEnglish "twentytwenty" should not return Some 2020. On the other hand, neither should it return Some 40, so I'm still inclined to consider the current implementation defective.

ploeh avatar Feb 10 '16 15:02 ploeh

What should it return then, a parsing error or a list of numbers (if we can detect a logical separation)? It's unusual to have a sequence of numbers without some separator, unless it's structure is known beforehand (i.e. treating a number as a list of numbers of certain fixed size, e.g. 1 or 2).

ncave avatar Feb 10 '16 17:02 ncave

If such numerals are ambiguous, the correct return value would be None.

ploeh avatar Feb 10 '16 17:02 ploeh