number-parser icon indicating copy to clipboard operation
number-parser copied to clipboard

Feature to handle individual digits in parse_number

Open arnavkapoor opened this issue 4 years ago • 2 comments

So one of the use cases I think could be parsing phone numbers or zip codes. These might be written in the form of two three zero two five eight etc with each of the digit spelled out. Using parse would return space separated string 2 3 0 2 5 8 while parse_number would give None. Neither gives the wanted output 230258 (number). (Of-course the user can do some additional processing on parse output which will work but having a feature it in the library itself might be better)

We can have a parameter in parse_number say relaxed which when set to true will build this number up as one large number.

arnavkapoor avatar Aug 26 '20 09:08 arnavkapoor

It sounds interesting.

We could also use a parameter like join_delimiter or something like that to choose how to join the followed numbers (that are followed omitting spaces, commas, etc).

Examples:

>>> parse('I have three numbers: one, two, three', join_delimiter='-')
'I have 3 numbers: 1-2-3'

>>> parse('I have three numbers: one, two, three', join_delimiter='')
'I have 3 numbers: 123'

>>> parse('I have three numbers: one, two, three', join_delimiter='/')
'I have 3 numbers: 1/2/3'


>>> parse('two three zero two five eight', join_delimiter='.')
2.3.0.2.5.8

noviluni avatar Aug 26 '20 09:08 noviluni

@noviluni sir, this is what I am thinking what do you suggest?

From this code

        myvalue = _build_number(tokens_taken, lang_data)
        for each_number in myvalue:
            current_sentence.append(each_number)
            current_sentence.append(" ") 

To this code

 if tokens_taken:
        myvalue = _build_number(tokens_taken, lang_data)
        for each_number in myvalue:
            if delimeter:
                current_sentence.append(each_number)
                current_sentence.append(delimeter)
            else:
                current_sentence.append(each_number)
                current_sentence.append(" ") 

Here, I have added a new parameter - delimiter into the parse function

Links for the code

https://github.com/scrapinghub/number-parser/blob/dab1f31c2fef1cd7e9881564136312d96de86385/number_parser/parser.py#L308

https://github.com/scrapinghub/number-parser/blob/dab1f31c2fef1cd7e9881564136312d96de86385/number_parser/parser.py#L287

NEERAJAP2001 avatar Oct 20 '20 13:10 NEERAJAP2001