w2n
w2n copied to clipboard
Handle a mixture of digits and spelt out numbers?
Input 9.7 million
, 7 million
, 4 thousand
etc
Expected output: 9700000, 7000000, 4000
Actual output: 1000000, 1000000, 1000
I wonder if there is a straightforward fix for this?
Not straightforward but looks doable. Thanks for pointing this out. I will look into this as soon as I get some free time. Meanwhile, you're also welcome to fix the error and submit a Pull Request. If it passes all the tests, it will be merged most probably.
In my own code, I've made a slightly hacky solution by combining this package with num2words. My current code will only catch things like 9 million, not 9.7 million, but with a little change to the regex it could be done.
Essentially, I use re.sub to find any digits in the string I'm passing to w2n and use a wrapper of num2words as the replace function (because it gets mad if you pass a str not an int). I've done my best to cut it out from the rest of my code below:
from num2words import num2words
from word2number import w2n
def strNum2Words(match):
string = match.group()
return num2words(int(string))
if re.search('\\d+',origtext):
origtext = re.sub('\\d+',strNum2Words,origtext)
number = w2n.word_to_num(origtext)
@lhami thanks for this. I solved the problem in another way in the end. Here's the code:
import regex
from word2number import w2n
NUMBERS = r"""
\b(
\d+
([.]\d+)?
[ ]
)?
\b(
zero |
quarters? |
thirds? |
half |
one |
two |
three |
four |
five |
six |
seven |
eight |
nine |
ten |
eleven |
twelve |
thirteen |
fourteen |
fifteen |
sixteen |
seventeen |
eighteen |
nineteen |
twenty |
thirty |
forty |
fifty |
sixty |
seventy |
eighty |
ninety |
hundred |
thousand |
million |
billion
)\b
"""
NUMBER_MATCHER = regex.compile(NUMBERS + r'([ \p{Pd}](and[ ])?' + NUMBERS + r')*', regex.I + regex.VERBOSE)
m = NUMBER_MATCHER.search(line)
if m:
multiplier = m.group(1)
try:
num = w2n.word_to_num(m.group(0))
if multiplier:
num = int(float(multiplier) * num)
except ValueError:
num = m.group(0)