Rafał Miłecki
Rafał Miłecki
I used template and invoice content provided by @gitaddgitpush in the first comment. It got parsed without any error as: ``` [ { "issuer": "My Template", "amount": 61.74, "date": "2020-03-12",...
Whether invoice2data and pdftotext modules works to extract text and parse bank statement pdf files?
> No template for C:/Users/Guest/PycharmProjects/MacineLearning/invoice2data/invoice/Howard_Bank.pdf > > i tried using invoice2data and pdftotext python modules ,for invoice pdf files i am able to capture required fields uisng yaml files with...
Whether invoice2data and pdftotext modules works to extract text and parse bank statement pdf files?
> where to use this line plugin ,whether in template folders where yaml files are placed? `lines` plugin / parser can you used for any field in your YAML template....
Duplicate of https://github.com/invoice-x/invoice2data/issues/239
Solved by 6ab31047386a ("Treat all YAML templates files as UTF-8 encoded") Make sure your .yml templates file is UTF-8 encoded. See https://github.com/invoice-x/invoice2data/pull/449
Maybe we could allow regex parser to accept multiple capturing groups and let template specify how to handle them? Something like: ``` amount: parser: regex regex: AMOUNT DUE:\s+(-?)\$(\d+?,?\d+\.\d+) group: concat...
If you use more recent `invoice2data` you'll see a lot of helpful messages. A list of templates directories, amount of loaded templates, matching results. As for you case I believe...
Thanks @C-Maxim for pointing out `--debug` option. As for regex for your case I'd suggest something much simpler like: ``` amount: GESAMT:\s*(\d+,\d+)\s*€ ```
I use it as a lib but whenever something goes wrong, I have to use CLI to debug. So I'm all for adding some better errors handling.
``` lines: parser: lines start: Article Description with Specs.* end: C.S.T. first_line: \s*(?P[\d]+)\s+(?P.*)\s+/(?P.*)\s+/(?P.*)\s+(?P\w\w)\s+(?P\d+\.\d+)\s+(?P\d+\.\d+)\s+(?P\d+\.\d+)\s+(?P\d+\.\d+) line: \s+(?P.+)\s+(?P\d+\.\d+)\s+(?P\d+\.\d+)\s+(?P\d+\.\d+)\s+(?P\d+\.\d+) ```