number-parser icon indicating copy to clipboard operation
number-parser copied to clipboard

Ordinal number support

Open arnavkapoor opened this issue 4 years ago • 2 comments

Enhacing library I have made changes in the base structure of the files to support ordinals.

arnavkapoor avatar Aug 04 '20 14:08 arnavkapoor

Hi @noviluni sorry for the delayed response , I had composed a reply last night but forgot to hit send it seems. 😬

  1. I did consider using the approach ordinal_numbers -> cardinal_numbers -> number , however I thought of certain cases that might give some issue. If we have something like
    twentieth seventh fiftieth third this approach would give us 27 53 as output. As opposed to 20 7 50 3. A presence of an ordinal number indicates the end of the number so this would need to be incorporated. This along with the need to create a complete mapping for other languages like Spanish made me feel that this expanded structure (with same categories for ordinal numbers) would be more robust in the long term.

However if the primary goal is to have a basic integration with date-parser , definitely we can have some sort of mapping in place as suggested so that most of the dates and simple cases in English work. I will create a PR with the code that you have given.

  1. Thanks for the inputs with respect to long scale implementation , I will create a separate PR with the suggested changes. I did change the variable because it was boolean , however wanted to change it to IS_LONG_SCALE didn't realize I had missed the scale. Going forward USE_LONG_SCALE seems to be the most verbose and accurate one , so will use it.

arnavkapoor avatar Aug 06 '20 07:08 arnavkapoor

If we have something like twentieth seventh fiftieth third this approach would give us 27 53 as output. As opposed to 20 7 50 3

Hmm... that's only true if you implement the parse() method, but I was talking about the parse_ordinal method and this is not happening.

Let's see (using the function I wrote):

>>> parse_ordinal('twentieth')
20

>>> parse_ordinal('twentieth seventh')
None

>>> parse_ordinal('twenty seventh')
27

So, this is what I want you to do:

Open a new draft PR and add these new functions:

def  is_cardinal_token(token, language):
    ...  # Using the LanguageData for the language detects if the word is part of a number

def is_ordinal_token(token, language):
    token = _apply_cardinal_converstion(input_string, language)
    retutrn is_ordinal_token(token, language)

def is_number_token(token, language):
    return is_cardinal_token(token, language) or is_ordinal_token(token, language)

def parse_ordinal(input_string, language='en'):
    _apply_cardinal_converstion(input_string, language)
    return parse_number(input_string)

def _apply_cardinal_converstion(input_string, language):
    CARDINAL_DIRECT_NUMBERS = {'first': 'one', 'second': 'two', 'third': 'three', 'fifth': 'five', 'eighth': 'eight', 'ninth': 'nine', 'twelfth': 'twelve'}  # thiw will be coming from the LanguageData
    input_string = input_string.lower() 
    for word, number in CARDINAL_DIRECT_NUMBERS.items(): 
        input_string = input_string.replace(word, number) 
    input_string = re.sub(r'ieth$', 'y', input_string) 
    input_string = re.sub(r'th$', '', input_string) 
    return input_string

We will rewrite completely the _apply_cardinal_converstion() but for now it's ok.

After building this, we will change the parse() method to use directly the parse_number() and parse_ordinal() instead of the _build_number() by checking the tokens with is_number_token() (and/or is_cardinal_token() and is_ordinal_token()) and we will refactor the _build_number() too, but let's discuss this after submitting the draft PR.

I think this approach will be better for testing and will ease a lot the integration with dateparser. :smile:

noviluni avatar Aug 06 '20 13:08 noviluni