convert_MOT16_to_yolo icon indicating copy to clipboard operation
convert_MOT16_to_yolo copied to clipboard

Converts MOT16 and MOT17 annotations to YOLO format.

convert_MOT16_to_yolo

This repo contains script that will convert MOT 16 and MOT 17 annotations to YOLO format, Since MOT17 has the same images as MOT16 but improved and more accurate annotation, I recommend using MOT17.

How to use script?

  • Change the 'dataset_root' in convert_to_yolo.m to point to your MOT17/MOT16 location.
  • Run convert_to_yolo from matlab

Outputs

  • You will get yolo format annotations in 'labels' folder
  • You can also get images with drawn bboxes in 'drawn_img' folder by setting 'VISUALIZE=1' in convert_to_yolo.m

Note

I converted pedestrian, person_on_vehicle and static_person as a positive class (labeled as 0). Distraction and reflection classes are converted as don't-know class (labeled as '-1'). You should customize YOLO to ignore examples with '-1' class while computing the loss.

Note '-1' class is neither negative nor a positive class. Hence, we should ignore those kinds of objects when computing the loss/cost-function.

What is MOT16/17?

MOT17 Det is a dataset for people detection challenge from MOT (https://motchallenge.net/data/MOT17Det/). It contains 14 videos under different lighting, view, weather conditions, 7 of them are training set and another 7 are used as test set. This dataset, MOT 17Det is the improved version of MOT 16 (https://arxiv.org/pdf/1603.00831.pdf).

Dataset Statistics

According to https://arxiv.org/pdf/1603.00831.pdf, MOT 16 contains ~320,000 person annotations (Pedestrian + person_on_vehicle + static_person) Table 3. It also contains distractor class(statues, mannikin) and reflection class(reflection of people in the mirror). These two classes could be ignored by detector, for example, if detector detects them we do not say it is false detection and if detector misses them, we do not say misdetection. That way detector can learn from only 'clean' annotations.

Table3

This dataset annotation is diferent from YOLO annotations in three ways:

  • It contains whole video annotation in a single file
  • It contains 12 classes
  • Its annotations are in [frm_id,seq_id,xmin,ymin,w,h,confidence,class,visibility] and not in [relative_x, relative_y, relative_w, relative_h] format