PaddleOCR
PaddleOCR copied to clipboard
Annotating Custom KIE Dataset
Hi,
Thanks for developing PaddleOCR. I am trying to finetune KIE model on my custom dataset. I have my dataset annotated in Label Studio for form data.
Since, for relation extraction they dont provide the linking parameter
that you have mentioned in the KIE dataset annotation schema we need for training KIE model.
Can you explain a bit more about the this parameter. Is there any workaround possible for converting Label Studio annotated form data for KIE? My last resort would be to annotate the whole data again using PPOCR label.
Thanks in advance
Hi, you can refer to
- https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/kie/how_to_do_kie_en.md#222-ser--re
- https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/kie_en.md#12-custom-dataset
for the kie annotation.
Hi @littletomatodonkey thanks for links.
Just wanted to know that as I already annotated a bunch of data for kie using LABEL STUDIO
so the annotations.json does not have the linking parameter value
in it.
If you could explain me how is it getting calculated between the relational entities in a form (key-value pair) then I can convert my label studio annotations according to requirement of paddle-ocr kie model
Here is a sample of how my annotations look right now. Thanks in advance!
according to my practice, the label tool provided in this repo does not have the ability to label linking infos.
@littletomatodonkey @BigBookPlus Then how are we annotating for RE? In the sample annotation provided in the repo, it has the following values in it:
- Transcription
- Label
- Points
- Linking
I am assuming you are using the linking values [x,y] to link question to answer in PPOCRLabel so how is this getting calculated? Is it using euclidian distance between the coordinates of bounding boxes of question label and answer label respectively?
Hi @BigBookPlus, sorry i didnt read the documentation properly earlier, so linking parameter is calculated by combining id values of the relational fields.
For my label studio annotated dataset, this works by manually creating the linking field
@Tanmay98 do you have example for linking field. which field that named as id values
for the relational fields ?