PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

Annotating Custom KIE Dataset

Open Tanmay98 opened this issue 2 years ago • 5 comments

Hi,

Thanks for developing PaddleOCR. I am trying to finetune KIE model on my custom dataset. I have my dataset annotated in Label Studio for form data. Since, for relation extraction they dont provide the linking parameter that you have mentioned in the KIE dataset annotation schema we need for training KIE model.

Can you explain a bit more about the this parameter. Is there any workaround possible for converting Label Studio annotated form data for KIE? My last resort would be to annotate the whole data again using PPOCR label.

Thanks in advance

Tanmay98 avatar Oct 21 '22 14:10 Tanmay98

Hi, you can refer to

  1. https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/kie/how_to_do_kie_en.md#222-ser--re
  2. https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/kie_en.md#12-custom-dataset

for the kie annotation.

littletomatodonkey avatar Oct 24 '22 01:10 littletomatodonkey

Hi @littletomatodonkey thanks for links. Just wanted to know that as I already annotated a bunch of data for kie using LABEL STUDIO so the annotations.json does not have the linking parameter value in it.

If you could explain me how is it getting calculated between the relational entities in a form (key-value pair) then I can convert my label studio annotations according to requirement of paddle-ocr kie model

Here is a sample of how my annotations look right now. Thanks in advance!

Screenshot 2022-10-25 at 1 03 40 PM

Screenshot 2022-10-25 at 1 04 05 PM

Tanmay98 avatar Oct 25 '22 07:10 Tanmay98

according to my practice, the label tool provided in this repo does not have the ability to label linking infos.

BigBookPlus avatar Oct 26 '22 02:10 BigBookPlus

@littletomatodonkey @BigBookPlus Then how are we annotating for RE? In the sample annotation provided in the repo, it has the following values in it:

  • Transcription
  • Label
  • Points
  • Linking

I am assuming you are using the linking values [x,y] to link question to answer in PPOCRLabel so how is this getting calculated? Is it using euclidian distance between the coordinates of bounding boxes of question label and answer label respectively?

Tanmay98 avatar Oct 26 '22 08:10 Tanmay98

Hi @BigBookPlus, sorry i didnt read the documentation properly earlier, so linking parameter is calculated by combining id values of the relational fields.

For my label studio annotated dataset, this works by manually creating the linking field

Tanmay98 avatar Nov 02 '22 08:11 Tanmay98

@Tanmay98 do you have example for linking field. which field that named as id values for the relational fields ?

ariefwijaya avatar May 09 '23 05:05 ariefwijaya