Having trouble trying to make look ahead and behind work in amount for regex
So, I'm trying to extract the amount from the invoice but I need to use look ahead to get the correct amount.
Here is the expression I'm using
(?<=\w:\s)[\d+\.]{0,}\d+,\d*(?=\s)
It's supposed to match something like: GESAMT: 9,95 €
The bold part, I've tested it online at regex101 and it's working properly there (I did use the python flavor while testing).
But I keep getting regexp for field amount didn't match warning.
Can anyone tell me what I can do to fix it? If not then at least let me know if it's something to do with my regex or with the library?
Templates can be set to remove all white space because it generally makes matching more reliable. Maybe that's related to your issue?
Your regex also looks needlessly complicated. I'd start by simplifying it a bit and looking at the debug output from invoice2data to see the actual extracted text.
@m3nu Saw your reply a bit late, I ended up going for a custom parsing for that (using pdfplumber and manually finding the string)
I know it's not directly related to my problem but if you don't mind telling me, How would I access the debug output?
Run the command below in the command prompt, replacing my_invoice.pdf with the name of your invoice.
invoice2data --debug my_invoice.pdf
Thanks @C-Maxim for pointing out --debug option.
As for regex for your case I'd suggest something much simpler like:
amount: GESAMT:\s*(\d+,\d+)\s*€