invoice2data randomly changing order of parsed field results

I've found that there is a functionality in parser/regex.py that is changing the order in parsing results. This is affecting my export to csv but I don't know if you consider it a bug, improvement or not.

Let me explain. I'm parsing gas suppliers invoices and I parse a field that is consumption in kWh and associated it to another field that is the price of kWh at that moment. Sometimes you have different tranches on the same invoice so the parser returns a list with the results.

In parser/regex.py line 46 "set" is randomly changing the order of the list so I can't associate price and consumption as they are different parser results.

    # Remove duplicates maintaining the order by default (it's more
    # natural). Don't do that for legacy parsing to keep backward
    # compatibility.
    if legacy:
        result = list(set(result))

In my case, result = list(OrderedDict.fromkeys(result)) will remove duplicates and preserver order so work better for me.

Thank you

Apr 30 '21 09:04 juanluisrosaramos

Feel free to open a PR, if something can be improved.

Apr 30 '21 14:04 m3nu

if legacy: result = list(set(result))

This is used for legacy YAML syntax only. It has to stay that way for backward compatibility.

In my case, result = list(OrderedDict.fromkeys(result))

This is used for new YAML syntax.

So basically replace your

fields:
  foo: Consumption.*(\d+\.\d+).*kWh

with

fields:
  foo:
    parser: regex
    regex: Consumption.*(\d+\.\d+).*kWh
    type: float

Aug 29 '22 20:08 rmilecki