PaddleOCR Data annotation issue for Custom detection training

Data annotation issue for Custom detection training

Open anidiatm41 opened this issue 2 years ago • 4 comments

Hi team, I am trying to re-train the detection part with my custom data (want to use resnet50 and MobileNetv3 ).

So, I have my data in below csv format

Image_name Ground_Truth 1.png bank name 2.png HSBB Bank 3.png Debited amount 9.6M 4.png Invoice no:100045678

. . .

Now when I was following your instructions (https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_en/dataset/ocr_datasets_en.md )for detection model training ...as a part of that I need to annotate my data to the below format for training:

==== 1.1 PaddleOCR text detection format annotation The annotation file formats supported by the PaddleOCR text detection algorithm are as follows, separated by "\t":

" Image file name Image annotation information encoded by json.dumps" ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]

==== I am not sure , how can I annotate my data to this format to start the training? Pls help

Aug 10 '22 03:08 anidiatm41

@bely66 pls help

Aug 10 '22 06:08 anidiatm41

You can use PPOCRLabel tools to annotate your own dataset. https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5/PPOCRLabel

Aug 10 '22 09:08 andyjiang1116

thanks it works!!

curious to know the underlying architecture or technique of the annotation tool...

Aug 10 '22 15:08 anidiatm41

it use the PP-OCR model to generate the annotations, you can recorrect wrong annotations.

Aug 12 '22 02:08 andyjiang1116

I have got the Label.txt files and exported.But later I need to change few annotation ..how to load and the label.txt again the ppocrlabel tool?

Sep 22 '22 08:09 anidiatm41

just put your Label.txt in your data dir

Sep 22 '22 08:09 andyjiang1116

PaddleOCR PaddleOCR copied to clipboard

Data annotation issue for Custom detection training

PaddleOCR
PaddleOCR copied to clipboard