invoice2data
invoice2data copied to clipboard
Regex matching second field instead of first
Hi, First of, I want to say I am loving this project and have been having a lot of fun playing with it for the last few days. Now, I have been making my own templates and this has been going well for the most part but I have started encountering issues that I can't fully wrap my head around. Here is one.
I have a PDF invoice which eventually contains the following section (after putting it through pdftotext)
TVA (20%%)
Total TTC
28,56 €
171,36 €
I have the following template field:
fields:
amount: Total TTC\s+([\d,]+)\s€
When i try with:
invoice2data "my_invoice.pdf" \
--debug \
--input-reader pdftotext \
--template-folder "my_templates" --exclude-built-in-templates \
--output-format json --output-name "output.json"
I get this:
DEBUG:invoice2data.extract.parsers.regex: field=amount | regex=Total TTC\s+([\d,]+)\s. | matches=['171,36']
Can you explain why the second field is being matched instead of the first?
And, what regex should I be writing if I wanted to store both values respectively in amount_taxes and amount fields?
Thanks!