number-parser icon indicating copy to clipboard operation
number-parser copied to clipboard

Support ordinals

Open noviluni opened this issue 5 years ago • 3 comments

I open this ticket to track the ordinal's feature.

From my understanding, what we should achieve is:

>>> parse('first')
'1st'

>>> parse('second')
'2nd'

>>> parse('third')
'3rd'

>>> parse('twenty-third')
'23rd'

>>> parse('thirtieth')
'30th'

However, as we support other words in the sentence, we should probably take care of some ambiguous words. I would take special care to "second". I think it should be translated to "2nd" only when it's not preceded by:

  • 1 (example: "1 second")
  • one (example: "one second")
  • a (example "a second").
  • another ordinal (examples: "first second" --> "1st second" or "fourth second" --> "4th second").

Of course, this logic would be probably necessary to be applied only to some languages, so it shouldn't be inside the main logic but in a language-specific section.

noviluni avatar Jun 17 '20 19:06 noviluni

Hi @noviluni so I had begin working on the support for ordinal numbers. The best approach I believe is to create similar structure like the cardinal numbers. One direction was to somehow extend the cardinal numbers to handle ordinal too. (storing additional suffix only , example th for English ). However there is a major difference between the ordinal and cardinal number in other languages.

22 - veintidós
22nd - vigésimo segundo

So, thus I plan to update the data files with the following proposed structure. I am thinking of adding the tokens for negative and decimal numbers too for future features. (For English negative_tokens might be 'minus', 'negative' and decimal_tokens would be 'point', 'dot' )

{
    "CARDINAL_NUMBERS": {
        "UNIT_NUMBERS": {},
        "DIRECT_NUMBERS": {},
        "TENS": {},
        "HUNDREDS": {},
        "BIG_POWERS_OF_TEN": {}
    },
    "ORDINAL_NUMBERS":{
        "UNIT_NUMBERS": {},
        "DIRECT_NUMBERS": {},
        "TENS": {},
        "HUNDREDS": {},
        "BIG_POWERS_OF_TEN": {}
    },
    "SKIP_TOKENS": [],
    "NEGATIVE_TOKENS": [],
    "DECIMAL_TOKENS":[],
    "LONG_SCALE": false
}

arnavkapoor avatar Aug 01 '20 10:08 arnavkapoor

Hi @arnavkapoor! It looks good! However, I'm not 100% sure of adding negative and decimal tokens right now for two reasons:

  • When trying to implement it, it could be possible that we find other aspects we don't consider right now and this approach wouldn't be working.
  • Every time we release a package, we usually expect to have a "stable" code. Adding this without being used, could cause confusion to users/devs, and make them crazy. We could wait to implement this (negative and decimal numbers) before releasing a new version, but we will probably need to release the new version with ordinal numbers support before doing it.

Does this make sense?

About the naming, it's ok :). Maybe we could change CARDINAL_NUMBERS by just NUMBERS, but up to you.

noviluni avatar Aug 03 '20 07:08 noviluni

Currently ordinal number support exists for only English language. https://github.com/arnavkapoor/number-parser/pull/31#pullrequestreview-461492622 . There needs to be changes to incorporate other languages. One way could be updating the _apply_cardinal_conversion mentioned here for other languages https://github.com/arnavkapoor/number-parser/pull/31#issuecomment-669913867 . The other could be creating same structure as cardinal number for ordinal number. The merged PR for ordinal number support for English is #35

arnavkapoor avatar Aug 26 '20 09:08 arnavkapoor