randomly changing order of parsed field results
I've found that there is a functionality in parser/regex.py that is changing the order in parsing results. This is affecting my export to csv but I don't know if you consider it a bug, improvement or not.
Let me explain. I'm parsing gas suppliers invoices and I parse a field that is consumption in kWh and associated it to another field that is the price of kWh at that moment. Sometimes you have different tranches on the same invoice so the parser returns a list with the results.
In parser/regex.py line 46 "set" is randomly changing the order of the list so I can't associate price and consumption as they are different parser results.
# Remove duplicates maintaining the order by default (it's more
# natural). Don't do that for legacy parsing to keep backward
# compatibility.
if legacy:
result = list(set(result))
In my case, result = list(OrderedDict.fromkeys(result)) will remove duplicates and preserver order so work better for me.
Thank you
Feel free to open a PR, if something can be improved.
if legacy: result = list(set(result))
This is used for legacy YAML syntax only. It has to stay that way for backward compatibility.
In my case, result = list(OrderedDict.fromkeys(result))
This is used for new YAML syntax.
So basically replace your
fields:
foo: Consumption.*(\d+\.\d+).*kWh
with
fields:
foo:
parser: regex
regex: Consumption.*(\d+\.\d+).*kWh
type: float