VisDrone-Dataset How to understand the format in the annotation file?

hello，The annotation file format : 684，8，273，116，0，0，0，0 ;

how to konw these number in someone format ? like this : x_min,y_min ,x_max,y_max or other detail

Dec 03 '19 02:12 xincyu

I still can't figure it out....it seems it is using the pixels instead of the usual boundaries. Any chance to convert it in the standard yolo format ?

Dec 11 '19 16:12 artynet

Hi @xincyu and @artynet

DET Submission Format as mentioned by the authors is as follows:

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

The detections in the ignored regions or labeled as "others" will be not considered in evaluation. The sample submission of the Faster-RCNN detector can be found in our website.

Feb 24 '20 11:02 dronefreak

Hey, we're trying to convert these to Yolo text format, any luck in converting from this format to that?

Nov 05 '22 15:11 adityachintala

Hey @adityachintala, YOLOv5 has a script that allows quick conversion for the VisDrone format to the YOLO format. Check it out here.

Nov 14 '22 21:11 adityatandon

Hey @adityatandon, do you know where I can find an explanation of what the script is doing? I see it's subtracting 1 from all classes, I thought this was to combine the person and pedestrian class, but when I train the model using this yaml I'm still getting results for pedestrian.

Nov 15 '22 09:11 Gareth1995

Hey @Gareth1995, I dont remember seeing a full explanation of the code anywhere. However, I can offer my two cents on what I've understood.

In the original annotations, the first 4 numbers represent the bounding box, the 5th number represents whether the annotation is ignored or considered and the 6th number represents the class label.

box = convert_box(img_size, tuple(map(int, row[:4]))) pulls the first 4 numbers for the bounding box and converts the bounding the box to the YOLO bounding box format.

cls = int(row[5]) - 1 reads the class label (number) associated at position 6 and subtracts 1 from the number since YOLO uses class labels in the range 0-9, while the original VisDrone annotations use class labels in the range 1-10.

This is my understanding of it, hope I was able to be of help.

Nov 18 '22 11:11 adityatandon

@Gareth1995 hi, did you train this dataset with yolo7? still, I am trying to convert the annotation to yolo7 format.

Nov 18 '22 17:11 miladnasiri

@miladnasiri I've only been working with yolov5 unfortunately.

Nov 25 '22 06:11 Gareth1995

Thanks for reply Becuse format for yolo 5 and yolo 7 are equal , if I know how you( share with me the script to convert visdrone dataset to yolo 5 format) it is enough for me i can do the rest. I just can not convert dataset annotation to annotation suitable for training with yolo 5 or 7

Sent from my iPhone

On 25 Nov 2022, at 7:51 AM, Gareth1995 @.***> wrote:

@miladnasiri I've only been working with yolov5 unfortunately.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Nov 25 '22 08:11 miladnasiri

Hi @miladnasiri ,

I've attached a ZIP file with the annotations for the VisDrone dataset in both - the original format (under the annotations folder) and the converted YOLO format (under the labels folder). I used the YOLOv5 VisDrone YAML file for conversion, which is available here.

Please keep in mind that the converted class labels in the YOLO format are in the 0-9 range.

I have trained the VisDrone dataset with both the YOLOv5 and YOLOv7 detectors with these labels and it works fine. Hope this is of help to you. VisDrone-YOLO.zip

Nov 25 '22 11:11 adityatandon

@adityatandon hi , thanks for your help the first link is not working. did you also work with task 2 in visdrone dataset (video dataset )?

Nov 25 '22 11:11 miladnasiri

Hey @miladnasiri I just updated my earlier comment to ensure the link is working now. No, I only attempted the detection task with images and did not do the SOT and MOT tasks with videos.

Nov 25 '22 11:11 adityatandon

@adityatandon hi, can you please share with me the weight(training with yolo 7) ? this is my email [email protected]

Nov 28 '22 09:11 miladnasiri

Hi @miladnasiri ,

I've attached a ZIP file with the annotations for the VisDrone dataset in both - the original format (under the annotations folder) and the converted YOLO format (under the labels folder). I used the YOLOv5 VisDrone YAML file for conversion, which is available here.

Please keep in mind that the converted class labels in the YOLO format are in the 0-9 range.

I have trained the VisDrone dataset with both the YOLOv5 and YOLOv7 detectors with these labels and it works fine. Hope this is of help to you. VisDrone-YOLO.zip

how did you use yaml file i am trying to use single object tracking

Dec 07 '22 18:12 nihanaltaytas

It was recently brought to my attention that the files I had uploaded with the annotations were incomplete. I've recently fixed these annotations and now if you're looking for a better way to convert the annotations with a cleaner script, I've made this available.

I've added the code to convert the annotations as well as the annotations in the YOLO format on my Github repository here. Feel free to use that for your experiments if you'd like.

Feb 25 '23 01:02 adityatandon

Hi @miladnasiri ,

I've attached a ZIP file with the annotations for the VisDrone dataset in both - the original format (under the annotations folder) and the converted YOLO format (under the labels folder). I used the YOLOv5 VisDrone YAML file for conversion, which is available here.

Please keep in mind that the converted class labels in the YOLO format are in the 0-9 range.

I have trained the VisDrone dataset with both the YOLOv5 and YOLOv7 detectors with these labels and it works fine. Hope this is of help to you. VisDrone-YOLO.zip

Will I encounter any errors if I try it in YOLOV3?

May 07 '23 19:05 EbubekirGONEY

@adityatandon Hey, can you help me?

Will I encounter any errors if I try it in YOLOV3?

May 15 '23 21:05 EbubekirGONEY

Hi @xincyu and @artynet

DET Submission Format as mentioned by the authors is as follows:

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

The detections in the ignored regions or labeled as "others" will be not considered in evaluation. The sample submission of the Faster-RCNN detector can be found in our website.

189,122,533,23,14,29,1,1,0,0 190,122,533,24,14,29,1,1,0,0

The MOT test dataset format looks like this. Does anyone know what it means? I think it is [frame, ID, bb_top_left_x, bbtop_left_y, width, height, conf, class_label, unknown, unknown]

but I can't find it online. Can anyone help?

May 20 '23 17:05 bad-engineer

Hi @xincyu and @artynet DET Submission Format as mentioned by the authors is as follows:

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

The detections in the ignored regions or labeled as "others" will be not considered in evaluation. The sample submission of the Faster-RCNN detector can be found in our website.

189,122,533,23,14,29,1,1,0,0 190,122,533,24,14,29,1,1,0,0

The MOT test dataset format looks like this. Does anyone know what it means? I think it is [frame, ID, bb_top_left_x, bbtop_left_y, width, height, conf, class_label, unknown, unknown]

but I can't find it online. Can anyone help?

Have you found the answer to it ?

Visdrones Video Detection dev- test set has following format 98 ,0 ,808 ,1 ,47 ,22 ,1 ,4 , 0, 0 I am not sure what it refers to .

Sep 27 '23 07:09 Ainecop

Would the conversion code work for yolov8 ? And if i just want to retain only persons? WHich classes should I remove?

Dec 07 '23 08:12 fatbringer

VisDrone-Dataset VisDrone-Dataset copied to clipboard

How to understand the format in the annotation file?

VisDrone-Dataset
VisDrone-Dataset copied to clipboard