invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

ValueError when parsing number including currency symbol ($)

Open jschnurr opened this issue 6 years ago • 2 comments

I have a source document where positive numbers are expressed as $1,200.00, and negative numbers like this: -$1,716.96.

For positive numbers, the field regex can ignore the $ sign, and the resulting value parses correctly. However, to pick up the minus sign, I need to capture the $ too, which causes the parser to fail.

I've used a replace configuration to substitute $ for `` in the interim, but it would be better if we could 1) concatenate capture groups automatically or 2) handle the currency symbol while parsing numbers.

What is the preferred approach?

DEBUG:root:field=amount_due | regexp=AMOUNT DUE:\s+(-?\$\d+?,?\d+\.\d+)
DEBUG:root:res_find=['-$1,716.96', '-$1,716.96']
Traceback (most recent call last):
  File "/home/user/.virtualenvs/pdftest/bin/invoice2data", line 10, in <module>
    sys.exit(main())
  File "/home/user/.virtualenvs/pdftest/lib/python3.6/site-packages/invoice2data/main.py", line 194, in main
    res = extract_data(f.name, templates=templates, input_module=input_module)
  File "/home/user/.virtualenvs/pdftest/lib/python3.6/site-packages/invoice2data/main.py", line 93, in extract_data
    return t.extract(optimized_str)
  File "/home/user/.virtualenvs/pdftest/lib/python3.6/site-packages/invoice2data/extract/invoice_template.py", line 186, in extract
    output[k] = self.parse_number(res_find[0])
  File "/home/user/.virtualenvs/pdftest/lib/python3.6/site-packages/invoice2data/extract/invoice_template.py", line 106, in parse_number
    return float(amount_pipe_no_thousand_sep.replace('|', '.'))
ValueError: could not convert string to float: '-$1716.96'

jschnurr avatar May 20 '19 15:05 jschnurr

You could do a custom field that only picks up the minus-sign and merge it with the amount later.

m3nu avatar May 20 '19 15:05 m3nu

Maybe we could allow regex parser to accept multiple capturing groups and let template specify how to handle them?

Something like:

amount:
  parser: regex
  regex: AMOUNT DUE:\s+(-?)\$(\d+?,?\d+\.\d+)
  group: concat

rmilecki avatar Jan 22 '23 21:01 rmilecki