Support ordinals
I open this ticket to track the ordinal's feature.
From my understanding, what we should achieve is:
>>> parse('first')
'1st'
>>> parse('second')
'2nd'
>>> parse('third')
'3rd'
>>> parse('twenty-third')
'23rd'
>>> parse('thirtieth')
'30th'
However, as we support other words in the sentence, we should probably take care of some ambiguous words. I would take special care to "second". I think it should be translated to "2nd" only when it's not preceded by:
1(example:"1 second")one(example:"one second")a(example"a second").- another ordinal (examples:
"first second"-->"1st second"or"fourth second"-->"4th second").
Of course, this logic would be probably necessary to be applied only to some languages, so it shouldn't be inside the main logic but in a language-specific section.
Hi @noviluni so I had begin working on the support for ordinal numbers. The best approach I believe is to create similar structure like the cardinal numbers. One direction was to somehow extend the cardinal numbers to handle ordinal too. (storing additional suffix only , example th for English ). However there is a major difference between the ordinal and cardinal number in other languages.
22 - veintidós
22nd - vigésimo segundo
So, thus I plan to update the data files with the following proposed structure. I am thinking of adding the tokens for negative and decimal numbers too for future features. (For English negative_tokens might be 'minus', 'negative' and decimal_tokens would be 'point', 'dot' )
{
"CARDINAL_NUMBERS": {
"UNIT_NUMBERS": {},
"DIRECT_NUMBERS": {},
"TENS": {},
"HUNDREDS": {},
"BIG_POWERS_OF_TEN": {}
},
"ORDINAL_NUMBERS":{
"UNIT_NUMBERS": {},
"DIRECT_NUMBERS": {},
"TENS": {},
"HUNDREDS": {},
"BIG_POWERS_OF_TEN": {}
},
"SKIP_TOKENS": [],
"NEGATIVE_TOKENS": [],
"DECIMAL_TOKENS":[],
"LONG_SCALE": false
}
Hi @arnavkapoor! It looks good! However, I'm not 100% sure of adding negative and decimal tokens right now for two reasons:
- When trying to implement it, it could be possible that we find other aspects we don't consider right now and this approach wouldn't be working.
- Every time we release a package, we usually expect to have a "stable" code. Adding this without being used, could cause confusion to users/devs, and make them crazy. We could wait to implement this (negative and decimal numbers) before releasing a new version, but we will probably need to release the new version with ordinal numbers support before doing it.
Does this make sense?
About the naming, it's ok :). Maybe we could change CARDINAL_NUMBERS by just NUMBERS, but up to you.
Currently ordinal number support exists for only English language. https://github.com/arnavkapoor/number-parser/pull/31#pullrequestreview-461492622 . There needs to be changes to incorporate other languages. One way could be updating the _apply_cardinal_conversion mentioned here for other languages https://github.com/arnavkapoor/number-parser/pull/31#issuecomment-669913867 .
The other could be creating same structure as cardinal number for ordinal number.
The merged PR for ordinal number support for English is #35