invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

Support "priority" and add 2 templates for InsERT-software-generated invoices

Open rmilecki opened this issue 1 year ago • 2 comments

templates: pl: add templates for InsERT's software issued invoices

InsERT is one of the most popular Polish accounting software company.
They have two very common softwares:
1. Subiekt nexo
2. Subiekt GT

Add 2 templates to parse invoices generated by above softwares. Those
templates are software-specific so they have priority set to 3.

This allows parsing a lot invoices issued in Poland.
Regex: support more grouping functions

In case of multiple matches we only had an option to return a sum of
numeric values. Add more functions.
Add "priority" support for templates

In case of multiple templates matching given invoice - choose the one
with the highest "priority" value. To provide proper support for
prioritizing AND existing templates (backward compatibility) the default
value 5 is assumed in case "priority" property is missing.

This feature can be used for writing more generic as well as more
specific templates. So far all templates were assumed to be
company-specific. With this change we can have:
1. Invoice-generating software specific templates
2. In-company varying templates

This feature may be very useful for:
1. Countries with just few very popular accounting software applications
2. Big companies with multiple departments adding some invoice details

rmilecki avatar Oct 01 '22 17:10 rmilecki

Thanks for this very interesting PR!

Will need some time to test it. I like the idea of creating generic templates, even selecting on the accounting software or library that generated it. At some point it might be interesting to look / filter on the meta data passed in the invoicefile.

Just a thought.. Will the use of prioritys have a big impact on precessing time?

Either way, being able to parse multiple invoices and minimize the writing write specific templates is a big plus.

bosd avatar Oct 03 '22 06:10 bosd

I'm happy to get a positive respose to this. Take your time to review / test this.

Will the use of prioritys have a big impact on precessing time?

I think at loading templates takes the most time. This part doesn't get changed with this pull request.

With this change matches_input() gets called for every loaded template. Previously it was called until the first match. matches_input() is a simple function that just uses re.search(). So this pull request shouldn't affect perfroamnce much.


Since you mentioned perfromance I think we should check why loading templates takes so much time. Is that file access or parsing YAML? Maybe we could optimize / cache?

(Just a topic for another discussion)

rmilecki avatar Oct 03 '22 06:10 rmilecki