invoice2data icon indicating copy to clipboard operation
invoice2data copied to clipboard

Automatic generation of regular expression for template file

Open AshNTU opened this issue 6 years ago • 1 comments

One of the challenge we face while creating template is to write the regular expression manually, which can cover all the variations for the field across all the invoice. We end up trying with many variations, which is quite a time consuming task.

Is there a way (some library/api) where we can just give the examples, and the regular expression is automatically generated based on the examples provided.

Example:

["Invoice No. : INV19022853", "Invoice No. : INV21040976"]

and it returns the regular expression:

"Invoice\sNo.\s+:\s+(\w+)"

AshNTU avatar Jun 14 '19 02:06 AshNTU

That's a hard computational program. I'm not aware of any quick solution. With enough sample data a neural network could be trained to cover some aspects of it. While I'd be interested in working on this, I lack the sample data and budget for now.

m3nu avatar Jun 14 '19 02:06 m3nu

We don't have enough manpower to implement such solution as part of this project. I also think it's out of scope of the invoice2data. See also https://github.com/invoice-x/invoice2data/issues/361

rmilecki avatar Jan 22 '23 16:01 rmilecki