w2n icon indicating copy to clipboard operation
w2n copied to clipboard

Handle a mixture of digits and spelt out numbers?

Open philgooch opened this issue 6 years ago • 3 comments

Input 9.7 million, 7 million, 4 thousand etc Expected output: 9700000, 7000000, 4000 Actual output: 1000000, 1000000, 1000

I wonder if there is a straightforward fix for this?

philgooch avatar Mar 14 '18 18:03 philgooch

Not straightforward but looks doable. Thanks for pointing this out. I will look into this as soon as I get some free time. Meanwhile, you're also welcome to fix the error and submit a Pull Request. If it passes all the tests, it will be merged most probably.

akshaynagpal avatar Mar 16 '18 14:03 akshaynagpal

In my own code, I've made a slightly hacky solution by combining this package with num2words. My current code will only catch things like 9 million, not 9.7 million, but with a little change to the regex it could be done.

Essentially, I use re.sub to find any digits in the string I'm passing to w2n and use a wrapper of num2words as the replace function (because it gets mad if you pass a str not an int). I've done my best to cut it out from the rest of my code below:

from num2words import num2words
from word2number import w2n
def strNum2Words(match):
    string = match.group()
    return num2words(int(string))
if re.search('\\d+',origtext):
    origtext = re.sub('\\d+',strNum2Words,origtext)
number = w2n.word_to_num(origtext)

lhami avatar Nov 25 '18 18:11 lhami

@lhami thanks for this. I solved the problem in another way in the end. Here's the code:

import regex
from word2number import w2n

NUMBERS = r"""
\b(
\d+
([.]\d+)?
[ ]
)?
\b(
    zero |
    quarters? |
    thirds? |
    half |
    one |
    two |
    three |
    four |
    five |
    six |
    seven |
    eight |
    nine |
    ten |
    eleven |
    twelve |
    thirteen |
    fourteen |
    fifteen |
    sixteen |
    seventeen |
    eighteen |
    nineteen |
    twenty |
    thirty |
    forty |
    fifty |
    sixty |
    seventy |
    eighty |
    ninety |
    hundred |
    thousand |
    million |
    billion
)\b
"""
NUMBER_MATCHER = regex.compile(NUMBERS + r'([ \p{Pd}](and[ ])?' + NUMBERS + r')*', regex.I + regex.VERBOSE)

m = NUMBER_MATCHER.search(line)
if m:
    multiplier = m.group(1)
    try:
        num = w2n.word_to_num(m.group(0))
        if multiplier:
            num = int(float(multiplier) * num)
    except ValueError:
            num = m.group(0)

philgooch avatar Nov 26 '18 10:11 philgooch