invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

Error unhashable type

Open gitaddgitpush opened this issue 3 years ago • 6 comments

Description of the problem

I'm having trouble with the package invoice2data about an error I can't solve.

When I set this template for an invoice :

issuer: My Template
keywords:
- www.webok.com
- 123 4567 89
fields:
  amount: TOTAL\s+.(\d+\.\d+)
  date: Date:\s+(\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{1,2})
  invoice_number: Reference:\s(\w+)
  operator: Operators:\s(\w+)
options:
  currency: USD
  date_formats:
    - '%d/%m/%Y %G:%i'
  languages:
    - en
  decimal_separator: '.'
lines:
    start: Your Reference:+\s+\w+\n_+
    end: \s+_+\n+\s+TOTAL\s+.(\d+\.\d+)
    line: (?P<description>.+)\s+\((?P<quantity>.+)\)\s+.(?P<price>\d+\.\d+)

I don't have any error, but, if I add this at the end of fields I've got the following unhashable type error :

fields:
  ...
  friendly_name:
    parser: static
    value: Amazon

unhashable type error :

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/invoice2data", line 11, in <module>
    load_entry_point('invoice2data==0.3.5', 'console_scripts', 'invoice2data')()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 201, in main
    res = extract_data(f.name, templates=templates, input_module=input_module)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 93, in extract_data
    return t.extract(optimized_str)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/extract/invoice_template.py", line 174, in extract
    res_find = re.findall(v, optimized_str)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 181, in findall
    return _compile(pattern, flags).findall(string)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 237, in _compile
    p, loc = _cache[cachekey]
TypeError: unhashable type: 'OrderedDict'

Can anyone help me with this please ? I believe that this is an error when the software tries to unpack the options but don't know how to solve it.

Here's what I have in the debug mode :

...
DEBUG:invoice2data.extract.invoice_template:field=vat_lines | regexp=OrderedDict([('parser', 'lines'), ('start', 'PAYMENT TYPE\\s+AMOUNT\\s+_+'), ('end', '\\s_+\\s+PLEASE KEEP THIS RECEIPT SAFE'), ('line', '(?P<type_paiment>\\w+)\\s+.(?P<montant>\\d+\\.\\d+)'), ('types', OrderedDict([('montant', 'float')]))])

Thanks for your help !

MINIMAL TO REPRODUCE THE ERROR :

script.py

import pprint
from invoice2data import extract_data
from invoice2data.extract.loader import read_templates

templates = read_templates('templates/')
result = extract_data("invoice.pdf", templates=templates)

pprint.pprint(result)

and this as a template (in the template folder)

templates/fr/fr.error.yml

issuer: My Template
keywords:
- www.webok.com
- 123 4567 89
fields:
  amount: TOTAL\s+.(\d+\.\d+)
  date: Date:\s+(\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{1,2})
  invoice_number: Reference:\s(\w+)
  operator: Operators:\s(\w+)
  vat_lines:
    parser: lines
    start: PAYMENT TYPE\s+AMOUNT\s+_+
    end: \s_+\s+PLEASE KEEP THIS RECEIPT SAFE
    line: (?P<type_paiment>\w+)\s+.(?P<montant>\d+\.\d+)
    types:
      montant: float
options:
  currency: USD
  date_formats:
    - '%d/%m/%Y %G:%i'
  languages:
    - en
  decimal_separator: '.'
lines:
    start: Your Reference:+\s+\w+\n_+
    end: \s+_+\n+\s+TOTAL\s+.(\d+\.\d+)
    line: (?P<description>.+)\s+\((?P<quantity>.+)\)\s+.(?P<price>\d+\.\d+)

invoice.txt (needs to be converted in .pdf)

__________________________________________
            My Template
__________________________________________
Date: 03/12/2020 11:23
Operators: Me
Reference: ABC123
__________________________________________
First product                (1)    €12.93
Second product               (3)    €22.93
Third product                (1)    €12.95
Last product                 (1)    €12.93
                              _________
                        TOTAL       €61.74

VAT/CODE         NET      VAT
_____________________________
20%   S       €93.27   €18.66

PAYMENT TYPE           AMOUNT
_____________________________
CASH                   €61.74
CARD                    €0.00

CHANGE GIVEN            €3.07

__________________________________________
PLEASE KEEP THIS RECEIPT SAFE
FOR GUARANTEE PURPOSES
__________________________________________


Thanks for shopping with us!
VAT Number : 123 4567 89
www.webok.com

Debug output

And finally, the full output of the invoice2datadebug output :

DEBUG:invoice2data.main:START pdftotext result ===========================
DEBUG:invoice2data.main:__________________________________________
             My Template
__________________________________________
Date: 03/12/2020 11:23
Operators: Me
Reference: ABC123
__________________________________________
First product                  (1)    €12.93
Second product                 (3)    €22.93
Third product                  (1)    €12.95
Last product                   (1)    €12.93
                                _________
                         TOTAL        €61.74

VAT/CODE         NET      VAT
_____________________________
20%   S       €93.27   €18.66

PAYMENT TYPE           AMOUNT
_____________________________
CASH                   €61.74
CARD                    €0.00

CHANGE GIVEN             €3.07

__________________________________________
PLEASE KEEP THIS RECEIPT SAFE
FOR GUARANTEE PURPOSES
__________________________________________



Thanks for shopping with us!
VAT Number : 123 4567 89
www.webok.com


DEBUG:invoice2data.main:END pdftotext result =============================
DEBUG:invoice2data.main:Testing 254 template files
DEBUG:invoice2data.extract.invoice_template:Matched template fr.error.yml
DEBUG:invoice2data.extract.invoice_template:START optimized_str ========================
DEBUG:invoice2data.extract.invoice_template:__________________________________________
             My Template
__________________________________________
Date: 03/12/2020 11:23
Operators: Me
Reference: ABC123
__________________________________________
First product                  (1)    €12.93
Second product                 (3)    €22.93
Third product                  (1)    €12.95
Last product                   (1)    €12.93
                                _________
                         TOTAL        €61.74

VAT/CODE         NET      VAT
_____________________________
20%   S       €93.27   €18.66

PAYMENT TYPE           AMOUNT
_____________________________
CASH                   €61.74
CARD                    €0.00

CHANGE GIVEN             €3.07

__________________________________________
PLEASE KEEP THIS RECEIPT SAFE
FOR GUARANTEE PURPOSES
__________________________________________



Thanks for shopping with us!
VAT Number : 123 4567 89
www.webok.com


DEBUG:invoice2data.extract.invoice_template:END optimized_str ==========================
DEBUG:invoice2data.extract.invoice_template:Date parsing: languages=['en'] date_formats=['%d/%m/%Y %G:%i']
DEBUG:invoice2data.extract.invoice_template:Float parsing: decimal separator=.
DEBUG:invoice2data.extract.invoice_template:keywords=['www.webok.com', '123 4567 89']
DEBUG:invoice2data.extract.invoice_template:{'date_formats': ['%d/%m/%Y %G:%i'], 'lowercase': False, 'decimal_separator': '.', 'currency': 'USD', 'replace': [], 'languages': ['en'], 'remove_whitespace': False, 'remove_accents': False}
DEBUG:invoice2data.extract.invoice_template:field=amount | regexp=TOTAL\s+.(\d+\.\d+)
DEBUG:invoice2data.extract.invoice_template:res_find=[u'61.74']
DEBUG:invoice2data.extract.invoice_template:field=date | regexp=Date:\s+(\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{1,2})
DEBUG:invoice2data.extract.invoice_template:res_find=[u'03/12/2020 11:23']
DEBUG:invoice2data.extract.invoice_template:result of date parsing=2020-03-12 11:23:00
DEBUG:invoice2data.extract.invoice_template:field=invoice_number | regexp=Reference:\s(\w+)
DEBUG:invoice2data.extract.invoice_template:res_find=[u'ABC123']
DEBUG:invoice2data.extract.invoice_template:field=operator | regexp=Operators:\s(\w+)
DEBUG:invoice2data.extract.invoice_template:res_find=[u'Me']
DEBUG:invoice2data.extract.invoice_template:field=vat_lines | regexp=OrderedDict([('parser', 'lines'), ('start', 'PAYMENT TYPE\\s+AMOUNT\\s+_+'), ('end', '\\s_+\\s+PLEASE KEEP THIS RECEIPT SAFE'), ('line', '(?P<type_paiment>\\w+)\\s+.(?P<montant>\\d+\\.\\d+)'), ('types', OrderedDict([('montant', 'float')]))])
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/invoice2data", line 11, in <module>
    load_entry_point('invoice2data==0.3.5', 'console_scripts', 'invoice2data')()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 201, in main
    res = extract_data(f.name, templates=templates, input_module=input_module)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 93, in extract_data
    return t.extract(optimized_str)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/extract/invoice_template.py", line 174, in extract
    res_find = re.findall(v, optimized_str)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 181, in findall
    return _compile(pattern, flags).findall(string)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 237, in _compile
    p, loc = _cache[cachekey]
TypeError: unhashable type: 'OrderedDict'

gitaddgitpush avatar Jan 14 '21 15:01 gitaddgitpush

I am struggling with the same problem. Every time I try to write new rules in this form:

fields:
  dok_type:
    parser: static
    value: Amazon

I get the same error message. The same happens when I rewrite existing rules, like

ordernr: 'Your order number\s+(\d{9})'

in the format

ordernr:
  parser: regex
  regex: 'Your order number\s+(\d{9})'
  type: int

Each time the message "TypeError: unhashable type: 'OrderedDict'" appears.

christian-roeser avatar Mar 08 '21 13:03 christian-roeser

From TUTORIAL.md, "Each field can be defined as:

  • an associative array with parser, specifying parsing method ..."

Solution

Therefore, the following syntax works for me:

fields:
  total: {
    parser: regex,
    regex: 'Total.*\$(\d+\.?\d+)',
    type: float
  }

wdrammeh avatar Mar 15 '21 09:03 wdrammeh

From TUTORIAL.md, "Each field can be defined as:

  • an an associative array with parser, specifying parsing method ..."

Solution

Therefore, the following syntax works for me:

fields:
  total: {
    parser: regex,
    regex: 'Total.*\$(\d+\.?\d+)',
    type: float
  }

doesnt work for me same problem

erkin98 avatar Mar 15 '21 11:03 erkin98

@RossK1 @m3nu are u guys can help us?

erkin98 avatar Mar 22 '21 14:03 erkin98

@gitaddgitpush I followed the steps to reproduce the error but couldn't. Can you give me more information about your working environment, such as os, python version?

Here is the output I got:

{'amount': 61.74,
 'currency': 'USD',
 'date': datetime.datetime(2020, 3, 12, 11, 23),
 'desc': 'Invoice from My Template',
 'friendly_name': 'Amazon',
 'invoice_number': 'ABC123',
 'issuer': 'My Template',
 'operator': 'Me',
 'vat_lines': [{'montant': 61.74, 'type_paiment': 'CASH'},
               {'montant': 0.0, 'type_paiment': 'CARD'},
               {'montant': 3.07, 'type_paiment': 'GIVEN'}]}

duskybomb avatar Mar 31 '21 20:03 duskybomb

You need release 0.3.6 or newer for the fields: syntax support in YAML. Can you re-test with 0.3.6, please?

rmilecki avatar Sep 11 '21 21:09 rmilecki

I used template and invoice content provided by @gitaddgitpush in the first comment. It got parsed without any error as:

[
    {
        "issuer": "My Template",
        "amount": 61.74,
        "date": "2020-03-12",
        "invoice_number": "ABC123",
        "operator": "Me",
        "vat_lines": [
            {
                "type_paiment": "CASH",
                "montant": 61.74
            },
            {
                "type_paiment": "CARD",
                "montant": 0.0
            },
            {
                "type_paiment": "GIVEN",
                "montant": 3.07
            }
        ],
        "currency": "USD",
        "lines": [],
        "desc": "Invoice from My Template"
    }
]

rmilecki avatar Jan 22 '23 20:01 rmilecki