invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

randomly changing order of parsed field results

Open juanluisrosaramos opened this issue 4 years ago • 1 comments

I've found that there is a functionality in parser/regex.py that is changing the order in parsing results. This is affecting my export to csv but I don't know if you consider it a bug, improvement or not.

Let me explain. I'm parsing gas suppliers invoices and I parse a field that is consumption in kWh and associated it to another field that is the price of kWh at that moment. Sometimes you have different tranches on the same invoice so the parser returns a list with the results.

In parser/regex.py line 46 "set" is randomly changing the order of the list so I can't associate price and consumption as they are different parser results.

    # Remove duplicates maintaining the order by default (it's more
    # natural). Don't do that for legacy parsing to keep backward
    # compatibility.
    if legacy:
        result = list(set(result))

In my case, result = list(OrderedDict.fromkeys(result)) will remove duplicates and preserver order so work better for me.

Thank you

juanluisrosaramos avatar Apr 30 '21 09:04 juanluisrosaramos

Feel free to open a PR, if something can be improved.

m3nu avatar Apr 30 '21 14:04 m3nu

if legacy: result = list(set(result))

This is used for legacy YAML syntax only. It has to stay that way for backward compatibility.

In my case, result = list(OrderedDict.fromkeys(result))

This is used for new YAML syntax.

So basically replace your

fields:
  foo: Consumption.*(\d+\.\d+).*kWh

with

fields:
  foo:
    parser: regex
    regex: Consumption.*(\d+\.\d+).*kWh
    type: float

rmilecki avatar Aug 29 '22 20:08 rmilecki