PaddleOCR
PaddleOCR copied to clipboard
Data annotation issue for Custom detection training
Hi team, I am trying to re-train the detection part with my custom data (want to use resnet50 and MobileNetv3 ).
So, I have my data in below csv format
Image_name Ground_Truth 1.png bank name 2.png HSBB Bank 3.png Debited amount 9.6M 4.png Invoice no:100045678
. . .
Now when I was following your instructions (https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_en/dataset/ocr_datasets_en.md )for detection model training ...as a part of that I need to annotate my data to the below format for training:
==== 1.1 PaddleOCR text detection format annotation The annotation file formats supported by the PaddleOCR text detection algorithm are as follows, separated by "\t":
" Image file name Image annotation information encoded by json.dumps" ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
==== I am not sure , how can I annotate my data to this format to start the training? Pls help
@bely66 pls help
You can use PPOCRLabel tools to annotate your own dataset. https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5/PPOCRLabel
thanks it works!!
curious to know the underlying architecture or technique of the annotation tool...
it use the PP-OCR model to generate the annotations, you can recorrect wrong annotations.
I have got the Label.txt files and exported.But later I need to change few annotation ..how to load and the label.txt again the ppocrlabel tool?
just put your Label.txt in your data dir