PaddleOCR
PaddleOCR copied to clipboard
How to get annotations in prescribed format to perform KIE [SER and RE] training.
- 系统环境/System Environment:Windows (python 3.10)
- 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components:
- 运行指令/Command Code:
PPOCRLabel --kie True
- 完整报错/Complete Error Message:
Hey, I wanted to build KIE model using custom dataset I was able to get annotations in below shown format for my custom dataset after using PPOCRLabel kie annotation tool,
run command used to start PPOCRLabel => PPOCRLabel --kie True
" image path annotation information "
BYD-040723-231_1.png [{"transcription": "Johnstrnls", "points": [[120, 192], [264, 192], [264, 260], [120, 260]], "difficult": false, "key_cls": "company"}, {"transcription": "23.06.2023", "points": [[823, 494], [918, 494], [918, 538], [823, 538]], "difficult": false, "key_cls": "date"}, {"transcription": "Invoice Number", "points": [[588, 489], [707, 489], [707, 513], [588, 513]], "difficult": false, "key_cls": "invk"}, {"transcription": "9280011754", "points": [[685, 539], [590, 539], [590, 512], [685, 512]], "difficult": false, "key_cls": "invv"}, {"transcription": " Johnson Controls (India) Private L mited", "points": [[84, 294], [358, 294], [358, 319], [84, 319]], "difficult": false, "key_cls": "company"}]
...
how do I get the annotations in below prescribed format for performing KIE (SER and RE)
" image path annotation information "
zh_train_0.jpg [{"transcription": "汇丰晋信", "label": "other", "points": [[104, 114], [530, 114], [530, 175], [104, 175]], "id": 1, "linking": []}, {"transcription": "受理时间:", "label": "question", "points": [[126, 267], [266, 267], [266, 305], [126, 305]], "id": 7, "linking": [[7, 13]]}, {"transcription": "2020.6.15", "label": "answer", "points": [[321, 239], [537, 239], [537, 285], [321, 285]], "id": 13, "linking": [[7, 13]]}]
zh_train_1.jpg [{"transcription": "中国人体器官捐献", "label": "other", "points": [[544, 459], [954, 459], [954, 517], [544, 517]], "id": 1, "linking": []}, {"transcription": ">编号:MC545715483585", "label": "other", "points": [[1462, 470], [2054, 470], [2054, 543], [1462, 543]], "id": 10, "linking": []}, {"transcription": "CHINAORGANDONATION", "label": "other", "points": [[543, 516], [958, 516], [958, 551], [543, 551]], "id": 14, "linking": []}, {"transcription": "中国人体器官捐献志愿登记表", "label": "header", "points": [[635, 793], [1892, 793], [1892, 904], [635, 904]], "id": 18, "linking": []}]
...
How do I get the "id": and "linking":[ ] information from the PPOCRLabel annotation tool. Is there any other annotation tool available to achieve KIE (SER and RE)?
+1
Im reading from older posts that this has to be done manually 🥲
Manual annotation is merely impossible for large dataset.
Manual annotation is merely impossible for large dataset.
i hope we get some answers soon
@sagarjgb @sagarjgborg Have u found a way around this?
@sagarjgb @sagarjgborg Have u found a way around this?
Nope, looking forward for answer.
@sagarjgb @sagarjgborg Have u found a way around this?
Nope, looking forward for answer.
I found this, could be helpful "https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/applications/%E5%8F%91%E7%A5%A8%E5%85%B3%E9%94%AE%E4%BF%A1%E6%81%AF%E6%8A%BD%E5%8F%96.md#431-%E5%87%86%E5%A4%87%E6%95%B0%E6%8D%AE"
+1 Following. Need to do annotations but PPOCRLabel does not allow this
hello, is there anyone who figured out a simpler way to annotate this? or is manual annotation the only way to go?