PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

How to get annotations in prescribed format to perform KIE [SER and RE] training.

Open sagarjgb opened this issue 1 year ago • 9 comments

  • 系统环境/System Environment:Windows (python 3.10)
  • 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components:
  • 运行指令/Command Code: PPOCRLabel --kie True
  • 完整报错/Complete Error Message:

Hey, I wanted to build KIE model using custom dataset I was able to get annotations in below shown format for my custom dataset after using PPOCRLabel kie annotation tool,

run command used to start PPOCRLabel => PPOCRLabel --kie True

" image path                 annotation information "
BYD-040723-231_1.png	[{"transcription": "Johnstrnls", "points": [[120, 192], [264, 192], [264, 260], [120, 260]], "difficult": false, "key_cls": "company"}, {"transcription": "23.06.2023", "points": [[823, 494], [918, 494], [918, 538], [823, 538]], "difficult": false, "key_cls": "date"}, {"transcription": "Invoice Number", "points": [[588, 489], [707, 489], [707, 513], [588, 513]], "difficult": false, "key_cls": "invk"}, {"transcription": "9280011754", "points": [[685, 539], [590, 539], [590, 512], [685, 512]], "difficult": false, "key_cls": "invv"}, {"transcription": " Johnson Controls (India) Private L mited", "points": [[84, 294], [358, 294], [358, 319], [84, 319]], "difficult": false, "key_cls": "company"}]
...

how do I get the annotations in below prescribed format for performing KIE (SER and RE)

" image path                 annotation information "
zh_train_0.jpg   [{"transcription": "汇丰晋信", "label": "other", "points": [[104, 114], [530, 114], [530, 175], [104, 175]], "id": 1, "linking": []}, {"transcription": "受理时间:", "label": "question", "points": [[126, 267], [266, 267], [266, 305], [126, 305]], "id": 7, "linking": [[7, 13]]}, {"transcription": "2020.6.15", "label": "answer", "points": [[321, 239], [537, 239], [537, 285], [321, 285]], "id": 13, "linking": [[7, 13]]}]
zh_train_1.jpg   [{"transcription": "中国人体器官捐献", "label": "other", "points": [[544, 459], [954, 459], [954, 517], [544, 517]], "id": 1, "linking": []}, {"transcription": ">编号:MC545715483585", "label": "other", "points": [[1462, 470], [2054, 470], [2054, 543], [1462, 543]], "id": 10, "linking": []}, {"transcription": "CHINAORGANDONATION", "label": "other", "points": [[543, 516], [958, 516], [958, 551], [543, 551]], "id": 14, "linking": []}, {"transcription": "中国人体器官捐献志愿登记表", "label": "header", "points": [[635, 793], [1892, 793], [1892, 904], [635, 904]], "id": 18, "linking": []}]
...

How do I get the "id": and "linking":[ ] information from the PPOCRLabel annotation tool. Is there any other annotation tool available to achieve KIE (SER and RE)?

sagarjgb avatar Sep 10 '23 10:09 sagarjgb

+1

Rishi-NF avatar Sep 13 '23 23:09 Rishi-NF

Im reading from older posts that this has to be done manually 🥲

Rishi-NF avatar Sep 13 '23 23:09 Rishi-NF

Manual annotation is merely impossible for large dataset.

sagarjgborg avatar Sep 14 '23 19:09 sagarjgborg

Manual annotation is merely impossible for large dataset.

i hope we get some answers soon

Rishi-NF avatar Sep 14 '23 23:09 Rishi-NF

@sagarjgb @sagarjgborg Have u found a way around this?

Rishi-NF avatar Sep 18 '23 00:09 Rishi-NF

@sagarjgb @sagarjgborg Have u found a way around this?

Nope, looking forward for answer.

sagarjgb avatar Sep 21 '23 17:09 sagarjgb

@sagarjgb @sagarjgborg Have u found a way around this?

Nope, looking forward for answer.

I found this, could be helpful "https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/applications/%E5%8F%91%E7%A5%A8%E5%85%B3%E9%94%AE%E4%BF%A1%E6%81%AF%E6%8A%BD%E5%8F%96.md#431-%E5%87%86%E5%A4%87%E6%95%B0%E6%8D%AE"

Rishi-NF avatar Sep 21 '23 23:09 Rishi-NF

+1 Following. Need to do annotations but PPOCRLabel does not allow this

oscarjose9423 avatar Nov 16 '23 20:11 oscarjose9423

hello, is there anyone who figured out a simpler way to annotate this? or is manual annotation the only way to go?

piarosebelledelapaz avatar May 07 '24 12:05 piarosebelledelapaz