InvoiceNet icon indicating copy to clipboard operation
InvoiceNet copied to clipboard

info: table fields

Open IzzyHibbert opened this issue 4 years ago • 19 comments

Hi guys

I was wondering if fields represented in table (like the line items fields) are supported. If Yes, how to set them up ? If Not, that would really be a nice to have.

IzzyHibbert avatar Aug 19 '20 11:08 IzzyHibbert

I'm not completely sure which fields you're referring to. Do you have a sample image you can display in this ticket?

In general however, it should be possible to train a model for any field in a document as long as it's not a field which can have multiple occurrences. You can train a model even for fields that have multiple occurrences, but you will only be able to use one of the occurrences as the true label and the final extraction of such a model would also only be able to extract a single occurrence for this field.

naiveHobo avatar Aug 25 '20 05:08 naiveHobo

Thank you. You answered me already. I meant multiple occurrences such as the purchased articles described in the invoice, or "line items". You normally find more than one therefore you have Item1, Item2, Item3, and so on..

They typically are represented with a similar vertical and horizontal alignment.

Any chance that this is going to be included in the future or any idea how to start to develop in this direction ?

Thanks

IzzyHibbert avatar Aug 25 '20 19:08 IzzyHibbert

Hi @IzzyHibbert , you can try this API dedicated to invoices https://scandocflow.com

ocr-avenger avatar Aug 25 '20 20:08 ocr-avenger

It seems InvoiceNet cant handle the tables for example.

XML_1609163070

How can we extract the items from the table as the criteria of using the custom field take only a single key-value pair?

mirfan899 avatar Jan 05 '21 04:01 mirfan899

@mirfan899, @IzzyHibbert have you found a solution!?

seanbenhur avatar May 19 '21 11:05 seanbenhur

Nope. Use something else like yolo. I did solve the issue using Yolo3.

mirfan899 avatar May 19 '21 13:05 mirfan899

@mirfan899 That's great. Can you provide a link to the repository?

yackinn avatar Apr 28 '22 17:04 yackinn

https://github.com/ultralytics/yolov3

mirfan899 avatar Apr 29 '22 07:04 mirfan899

@mirfan899 Thank you. I'm not sure how yolo will extract invoice data though. Did you write your custom network?

yackinn avatar Apr 29 '22 07:04 yackinn

I labeled the dateset. Here are the results using yolo and then train a yolo v3 model.

gas

mirfan899 avatar May 04 '22 02:05 mirfan899

cant you at least use the OPTIONAL data type for small lists?

r-toroxel avatar Jun 09 '22 14:06 r-toroxel

@mirfan899 Thanks for sharing. Could you use Yolo to extract the line item details from the table? For example, if you want to extract payment history lines from your last photo i.e. something like:-

[{"Month": "Dec 2021", "HM3": 0.622, "Current Bill": 433.36, ...}, 
{"Month": "Nov 2021", "HM3": 0.387, ...}]

, is it possible? As far as I understood, neither InvoiceNet nor Yolo can do that.

AhmedHathout avatar Jun 17 '22 10:06 AhmedHathout

Why not. Yolo can solve the table issue. Just label the table and after detection use ocr to extract text.

mirfan899 avatar Jun 17 '22 11:06 mirfan899

I guess he's aiming for extracting formatted line items with labels not just text. Extracting text using ocr from the table will just give you some text.

yackinn avatar Jun 17 '22 11:06 yackinn

Thank you both for your quick replies Yes I wanted them to be formatted so that I know which text corresponds to which column. I will need to store these extracted data and process them depending on their columns.

AhmedHathout avatar Jun 17 '22 11:06 AhmedHathout

I have done similar to this. You need to label columns with yolo. Detect and OCR. You need more data to get better accuracy. Around 50 samples of a single template.

mirfan899 avatar Jun 17 '22 11:06 mirfan899

Can you show a sample of how you labeled the columns with yolo to detect single line items? I'm also interested in this.

yackinn avatar Jun 17 '22 11:06 yackinn

annotation_table Like this.

mirfan899 avatar Jun 22 '22 06:06 mirfan899

Can you provide a repository with sample code?

yackinn avatar Jun 23 '22 15:06 yackinn