invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

Is there a support for multiple regex for lines plugin?

Open AshNTU opened this issue 6 years ago • 2 comments

I have been using the library for some time to parse my company invoices. I encountered that for my invoices I have line items which can be either of the two format. One way is that I create two templates file for each of it or if there is support for the multiple regex for lines and parser just picks the one for which match has been found.

AshNTU avatar Jun 22 '19 12:06 AshNTU

For many fields, you can add a list of regex in the template and it will try all of them. I'm not sure if the line plugin is implemented in a similar way. If not, just add it and open a pull request.

m3nu avatar Jun 25 '19 00:06 m3nu

Right now lines parser supports only one set of rules like:

fields:
  lines:
    parser: lines
    start: Item\s+Discount\s+Price$
    end: \s+Total
    line: (?P<description>.+)\s+(?P<discount>\d+.\d+)\s+(?P<price>\d+\d+)

Whenever I deal with company that randomly adds and removes some column I make it optional with a ?, e.g.

line: (?P<description>.+)\s+(?P<discount>\d+.\d+)?\s+(?P<price>\d+\d+)

(in above example discount is optional)


As I understand it you're dealing with company that uses one or more completely different layouts for its lines-covered section?

So are you looking for support for something liike

fields:
  lines:
    parser: lines
    rules:
      - start: Item\s+Discount\s+Price$
        end: \s+Total
        line: (?P<description>.+)\s+(?P<discount>\d+.\d+)\s+(?P<price>\d+\d+)
      - start: Item\s+Price$
        end: \s+Total
        line: (?P<description>.+)\s+(?P<price>\d+\d+)

Is that correct?

rmilecki avatar Jun 19 '22 21:06 rmilecki

Implemented in #463

bosd avatar Mar 18 '23 11:03 bosd