mmocr icon indicating copy to clipboard operation
mmocr copied to clipboard

How to make Training data for Training key-information-extraction

Open hmiche opened this issue 2 years ago • 4 comments

Hello guys . how can i make ma training set to train my own custom key-information-extraction is there any available tools to do that ?

hmiche avatar May 25 '22 11:05 hmiche

Hi, MMOCR currently does not provide a data converter for KIE task. If you would like to train with your custom data, you may refer to the data preparation steps of the supported wildreciept dataset, and convert your own data to the same format. In addition, please feel free to raise a pr if you would like to add such a data converter in MMOCR.

xinke-wang avatar May 26 '22 01:05 xinke-wang

ok Thank you very much how much + I didn't understand the notion of Key/Value when I'm trying to Annotate my Data where should id mention the key and where should I select the value .

hmiche avatar May 26 '22 08:05 hmiche

Hi, you may refer to this tutorial to learn the key and value in KIE, which provides some examples of the key-value pairs.

xinke-wang avatar May 26 '22 09:05 xinke-wang

i Made this Annotation thats it looks Correct ?

{"file_name": "image_files/2d21105d-template19319.png", "height": 1168, "width": 1653, "annotations": [{"box": [835.8682269716757, 401.69202586323775, 864.5974563514809, 401.69202586323775, 864.5974563514809, 420.4222971786585, 835.8682269716757, 420.4222971786585], "text": "gt", "label": "1"}, {"box": [868.344747140151, 399.1946563545149, 893.3266857312859, 399.1946563545149, 893.3266857312859, 420.4222971786585, 868.344747140151, 420.4222971786585], "text": "loc", "label": "1"}, {"box": [842.1137116194595, 429.1630904591881, 1040.720123418983, 429.1630904591881, 1040.720123418983, 450.39073128333166, 842.1137116194595, 450.39073128333166], "text": "10,rueazrou,quartier", "label": "2"}, {"box": [839.6155177603459, 461.62889407258416, 957.0306291386806, 461.62889407258416, 957.0306291386806, 484.1052196510891, 839.6155177603459, 484.1052196510891], "text": "10020,rabat", "label": "2"}, {"box": [840.8646146899025, 495.3433824403416, 1054.460189644107, 495.3433824403416, 1054.460189644107, 529.057870808099, 840.8646146899025, 529.057870808099], "text": "8560451551218746063", "label": "3"}, {"box": [838.3664208307896, 366.7288527411188, 924.5541089702053, 366.7288527411188, 924.5541089702053, 385.4591240565397, 838.3664208307896, 385.4591240565397], "text": "b810-01", "label": "0"}, {"box": [1009.4864197530864, 366.72932038834944, 1029.4630041152263, 366.72932038834944, 1029.4630041152263, 385.4626796116504, 1009.4864197530864, 385.4626796116504], "text": "r", "label": "0"}]} {"file_name": "image_files/36c100a6-template19318.png", "height": 1168, "width": 1653, "annotations": [{"box": [840.8532098765431, 402.69918446601946, 868.0630864197531, 402.69918446601946, 868.0630864197531, 423.8592621359224, 840.8532098765431, 423.8592621359224], "text": "ya", "label": "1"}, {"box": [872.5980658436213, 396.64372815533983, 967.8099588477365, 396.64372815533983, 967.8099588477365, 420.8201941747573, 872.5980658436213, 420.8201941747573], "text": "restaurant", "label": "1"}, {"box": [837.8374485596707, 426.87565048543695, 1106.8751028806585, 426.87565048543695, 1106.8751028806585, 454.068504854369, 837.8374485596707, 454.068504854369], "text": "7,avanueabdelkarimbenjelloun", "label": "2"}, {"box": [843.8916460905351, 461.6208155339806, 958.7626748971195, 461.6208155339806, 958.7626748971195, 482.7808932038835, 843.8916460905351, 482.7808932038835], "text": "10020,rabat", "label": "2"}, {"box": [839.3566666666667, 499.3823689320389, 1050.9588065843623, 499.3823689320389, 1050.9588065843623, 520.5424466019418, 839.3566666666667, 520.5424466019418], "text": "5420565926897256814", "label": "3"}, {"box": [837.8374485596707, 364.9149514563107, 922.4828395061728, 364.9149514563107, 922.4828395061728, 386.0750291262136, 837.8374485596707, 386.0750291262136], "text": "c810-01", "label": "0"}, {"box": [1008.6247736625514, 366.4344854368932, 1026.7646913580247, 366.4344854368932, 1026.7646913580247, 386.0750291262136, 1008.6247736625514, 386.0750291262136], "text": "r", "label": "0"}]} {"file_name": "image_files/eca06b46-template19316.png", "height": 1168, "width": 1653, "annotations": [{"box": [840.4450617283951, 401.33840776699026, 879.6726337448561, 401.33840776699026, 879.6726337448561, 419.91300970873783, 840.4450617283951, 419.91300970873783], "text": "star", "label": "1"}, {"box": [881.736049382716, 401.33840776699026, 919.9205761316872, 401.33840776699026, 919.9205761316872, 418.892427184466, 881.736049382716, 418.892427184466], "text": "craft", "label": "1"}, {"box": [841.4654320987654, 430.23223300970875, 1028.3065843621398, 430.23223300970875, 1028.3065843621398, 453.977786407767, 841.4654320987654, 453.977786407767], "text": "avenuetarikibnziad", "label": "2"}, {"box": [843.5288477366256, 462.2331650485437, 959.1481481481483, 462.2331650485437, 959.1481481481483, 482.8716116504854, 843.5288477366256, 482.8716116504854], "text": "10020,rabat", "label": "2"}, {"box": [840.4450617283951, 495.25467961165043, 1051.0268312757203, 495.25467961165043, 1051.0268312757203, 524.1485048543689, 840.4450617283951, 524.1485048543689], "text": "5638984273905984178", "label": "3"}, {"box": [840.4450617283951, 365.2324660194175, 927.1538683127573, 365.2324660194175, 927.1538683127573, 388.9780194174757, 840.4450617283951, 388.9780194174757], "text": "c810-01", "label": "0"}, {"box": [1010.7788888888889, 367.2963106796116, 1027.2862139917695, 367.2963106796116, 1027.2862139917695, 385.8709126213592, 1010.7788888888889, 385.8709126213592], "text": "r", "label": "0"}]} {"file_name": "image_files/51ecae9f-template19317.png", "height": 1168, "width": 1653, "annotations": [{"box": [843.3628085490162, 397.9459716001536, 898.3230734495133, 397.9459716001536, 898.3230734495133, 424.1683514417427, 843.3628085490162, 424.1683514417427], "text": "setrap", "label": "1"}, {"box": [900.8212673086267, 399.1946563545149, 962.0270168569076, 399.1946563545149, 962.0270168569076, 424.16835144174263, 900.8212673086267, 424.16835144174263], "text": "traveaux", "label": "1"}, {"box": [965.7743076455781, 399.1946563545149, 1016.987281757405, 399.1946563545149, 1016.987281757405, 419.17361242429706, 965.7743076455781, 419.17361242429706], "text": "divers", "label": "1"}, {"box": [842.1137116194595, 429.1630904591881, 999.4999247436102, 429.1630904591881, 999.4999247436102, 452.8881007920544, 842.1137116194595, 452.8881007920544], "text": "2,rueidrissel", "label": "2"}, {"box": [842.1137116194595, 461.62889407258416, 947.0378537022266, 461.62889407258416, 947.0378537022266, 479.1104806336436, 842.1137116194595, 479.1104806336436], "text": "111,rabat", "label": "2"}, {"box": [842.1137116194595, 497.84075194906427, 1053.2110927145504, 497.84075194906427, 1053.2110927145504, 521.5657622819306, 842.1137116194595, 521.5657622819306], "text": "8764814411031892864", "label": "3"}, {"box": [838.3589711934156, 367.9766990291262, 929.5347325102881, 367.9766990291262, 929.5347325102881, 387.95743689320386, 838.3589711934156, 387.95743689320386], "text": "c810-01", "label": "0"}, {"box": [1004.4979423868313, 366.72932038834944, 1029.4856790123456, 366.72932038834944, 1029.4856790123456, 386.7100582524271, 1004.4979423868313, 386.7100582524271], "text": "r", "label": "0"}]}

hmiche avatar May 26 '22 09:05 hmiche