PaddleOCR
PaddleOCR copied to clipboard
PPOCRLabel: Annotation for PaddleOCR KIE and RE
Hello, thank you for providing annotation tool for PaddleOCR. I was wondering if the current PPOCRLabel already supports annotation for creating own dataset to train PaddleOCR Key Information Extraction and Relation Extraction?
- transcription: stores the text content of the text line
- label: the category of the text line content
- points: stores the four point position information of the text line
- id: stores the ID information of the text line for RE model training
- linking: stores the connection information between text lines for RE model training
If not, could you let me know how I can do it?
https://github.com/PaddlePaddle/PaddleOCR/blob/main/PPOCRLabel/README.md reference this document.
i use PPOCRLabel --kie TRUE but this only allows me to put class labels in the annotations. But I am not quite sure on how to do the linking part with PPOCR label. I checked the link provided above, but no info on how to annotate these:
- id: stores the ID information of the text line for RE model training
- linking: stores the connection information between text lines for RE model training
Very sorry, PPOCRLabel currently only supports annotation for KIE tasks and currently does not support annotation for RE tasks
oh ok, so does this mean i have to do the annotation manually if thats the case?
also do u have an idea if PaddleOCR RE model accepts multiple entities linked into one id?
For first question, PPOCR currently does not support RE task data annotation, which may require you to manually annotate the relationships between entities and provide link relationships in the format of the RE dataset. This may be a huge project. If you still want to do key information extraction, you can try PP ChatOCRv2 or UIE. Here is links about PP ChatOCRv2 and UIE: https://aistudio.baidu.com/community/app/70303?source=appMineRecent https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/information_extraction/README_en.md
About second question, Do you mean the relation about one key link to multi-value or multi-key link to one value?
okay thank you, i will have a look at the links you sent.
for the second question, i am working with vaccine names and dates. so for a vaccine name, there can be multiple dates associated with it. lets say, 2 dates will be linked to 1 vaccine name. is this possible?
id: 1 -> vaccine X id: 2 -> 1st date id: 3 -> 2nd date
so linking in the annotation would be: [1, 2][1, 3] for the vaccine X [1, 2] for the 1st date [1, 3] for the 2nd date
is this possible? and is it correct how i envision the linking would be?
Maybe this document can help you. https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppstructure/kie/how_to_do_kie_en.md#222-ser--re
ah ok thank you so much!! the link is very helpful
hi again. i have been exploring paddleNLP lately and i want to ask if it ispossible to integrate paddleOCR detection and recognition models for UIE relation extraction model from paddleNLP?
We would like to see developers try this, but there are currently no official plans to do so.
hello, i managed to find a work around regarding the annotations for KIE.
I just want to clarify if for training both SER and RE, would I be using the same dataset with the link id and the linking value?
Feel free to submit a PR to address this issue.