Automatic generation of regular expression for template file
One of the challenge we face while creating template is to write the regular expression manually, which can cover all the variations for the field across all the invoice. We end up trying with many variations, which is quite a time consuming task.
Is there a way (some library/api) where we can just give the examples, and the regular expression is automatically generated based on the examples provided.
Example:
["Invoice No. : INV19022853", "Invoice No. : INV21040976"]
and it returns the regular expression:
"Invoice\sNo.\s+:\s+(\w+)"
That's a hard computational program. I'm not aware of any quick solution. With enough sample data a neural network could be trained to cover some aspects of it. While I'd be interested in working on this, I lack the sample data and budget for now.
We don't have enough manpower to implement such solution as part of this project. I also think it's out of scope of the invoice2data. See also https://github.com/invoice-x/invoice2data/issues/361