semi_auto_label
semi_auto_label copied to clipboard
auto label raw images or videos in PASCAL VOC format
Instruction
This is a semiauto label tool helps you to label the PASCAL VOC data automatically.
All you need to do is to follow 3 steps:
S1: put your material such as video, raw pictures, in the certain directory
S2: change the label_config.cfg to choose the model and run the main.py
S2: get the xml file with voc format automatically
PS: if you want to get yolo-format txt, you can run train_yolo/voc_annotation.py to change the formats.
1.Env info
PC env
python >= 3.6.5
win10, mac or linux is ok
pip list
tensorflow-gpu=1.14.0
keras==2.2.5
opencv==4.2.0
lxml==4.4.2
numpy, matplotlib
tips: tf2 is ok, if you user your own tf2 model as following:
(1)replace of `self-define model` and change the function `auto_label_picdir` at line 72 in 'main.py';
(2)git clone from [keras_retina](https://github.com/fizyr/keras-retinanet) master branch and replace the dir `keras-retinanet-disable-tf2-behavior` with rebuild the env (or you can use `pip install keras_retina`);
build the env
pip install tensorflow-gpu==1.14.0
pip install keras==2.2.5
pip install opencv-python==4.2.0.32 opencv-contrib-python==4.2.0.32 lxml numpy matplotlib tqdm
cd ./keras-retinanet-disable-tf2-behavior
pip install . --user
python setup.py build_ext --inplace
2.Function & Usage
(1) auto-label for multi-object- detect
Applied for : Video with continous moving objects

(2) semiauto-label for self-define model
Applied for: Pictures with specific objects or certain object types

(3) auto-label for coco types
Applied for: Pictures with some types in coco

Tutorial video (click the picture below to see the detail / 点击下方图片查看使用基础教程)
3. Parameters Settings
you can change the parameters in label_config.cfg
[FILE] -> for the location and mood to choose
xml_dir -> the dir where to place the generated xml file
pic_dir -> the dir where you put your picture to label
model_name -> you can use 3 key words to choose the wanted mood: `MOT`, `SELF_DEFINE`, `RETINA`
[SELF_DEFINE] -> for the pretrained yolov3 to label specific types in picture
score_threshold -> scores for yolov3
involved_classes -> the pretrained yolov3 model's type of your own
[MOT] -> for the video label process
interval -> xml file save interval
set_width -> the width size of wanted picture
set_height ->the height size of wanted picture
video -> video path
tracker -> you can choose one of the MOT model to track the object: `csrt/kcf/boosting/mil/tld/medianflow/mosse`
[RETINA] -> for the retina model to label wanted types of 80 coco types
retina_weight -> the full path to store the pretrained model
coco_classes -> the types you want to label, if you want label all the types, fill it with words `all`
retina_threshold -> the threshold for model to detect
5. Train and iteration - self-define
-
self-define model : YOLOv3
You can use train & iteration to short the label process, if you use choose self-define mood to label many specific types raw picture on your own
case 1:
I offer all the yolov3's training py-file in dir /train_yolo. After your own training process, the new trained model can be used to replace the weight in dir /model/models/best_weight_711.h5 so that you can label your specific-types' picture with the tools. The details you can see README in dir /train_yolo
download pretrained weight yolo_weights.h5 and store in train_yolo/model_data/ for fine-turning your own type weight with YOLOv3
tips:
remember to download pretrained weight `best_weight_711.h5` and store it in dir `model/models/` if you want to semi-auto-label for following 4 types:
(**types: phone/cigar/person/hat**)
case 2:
You can also use your own model to detect objects to generate VOC xml. But how? you can change the code in main.py from line 70 to line 73;
make sure your model are encapsulated as followings:
out_boxes, out_scores, out_classes = xxxx_model (image)
Discussion on the Manual label Vs semiauto label
as far as gtx 1060 is concerned. Performance may be much better with better GPU.
My experience:
it will take 1 day's work for manual label about 1000 raw pictures,
it will take about 3-6 hours to train the yolov3 with the labeled 1000 picture
then you can use this weight to auto label about 5000 raw pic with about 20 minutes while you can use 1-3 hours to refine the xml data
it will take about 1 day's time to retrained the yolov3 with 6k labeled data
After 6k's data , it can be used to auto label 1w-2w picture about 1-2 hours, with half day's work to refine the picture.
The final retrain process may take about 3 days, so the model will be very robost with more than 2w data.
About 2w raw picture to label, for 1 person, on Nvidia GTX 1060 under 8 hours' working time
| Manual label | semi-label specific types | auto-label video or some of coco data |
|---|---|---|
| 15 days work for label; 3-4 day for train | 2 day's label&refine and 4 days' for train | 2-3 hours |
6 Operation - MOT & RETINA
-
MOT
you can enter words in your keyboard to choose the function:
tips:
before beginning, remember to change the type names and corresponding key number at mot.py from line 13-17
| key in keyboard | function |
|---|---|
s |
begin to go into ROI choose process, see the reminder on top , and enter the type you wan to label |
Left mouse button |
after in the ROI process, you can choose ROI to track, if you have chose one, then you must enter enter to confirm otherwise it will cancel. You can choose many ROIs .After you have chosen all, enter Esc to come to the main process |
r |
enter r to clean all the trackers and re-choose the ROI again |
q |
enter q to quit all |
-
RETINA
coco_classes=all -> you want to label 80 types
coco_classes=person/car -> you only want to label person and car, others are ignored
the type names are showing as followings:
0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train',
7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter',
13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant',
21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie',
28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite',
34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket',
39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl',
46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog',
53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed',
60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard',
67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator',
73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier',
79: 'toothbrush'
Pretrained Model
-
YOLOV3:
function?
best_weight_711.h5 -> pretrained yolov3 for detecting `phone/cigar/person/hat` yolo_weights.h5 -> could be used for fine-turning YOLOv3 for your own typeswhich dir to store them?
best_weight_711.h5 -> `/model/models/` yolo_weights.h5 -> `train_yolo/model_data/` -
RetinaNet:
funciton?
resnet50_coco_best_v2.1.0.h5 -> retinanet model weight to detect coco typeswhich dir to store them?
resnet50_coco_best_v2.1.0.h5 -> `/load_weight/`
see the release
TO-DO list
- [ ] ROI select to crop image with differ types in video
- [ ] more efficient object-detect model such as EfficientNet, or scaled-yolov4 to replace self-define model part
- [ ] operation GUI
- [ ] GAN series in generate new images/style transfer/faces changes/ super resolution
- [ ] WSOD in annotation or heat-map visual with few manual annotations
- [ ] Vision Transformer to generate specific images
- [ ] incremental learning and self-learning algorithms or methods during iteration and optimazing process