VisDrone-Dataset How to interprete annotation file values in Object Detection in Videos task?

I downloaded Task 2 dataset and unzipped it, then i got the annotation files and the format like below:

1,0,593,43,174,190,0,0,0,0
2,0,592,43,174,189,0,0,0,0
3,0,592,43,174,189,0,0,0,0
4,0,592,43,174,189,0,0,0,0
5,0,592,43,174,189,0,0,0,0
...

I found below description,

 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

But I think the description is quite different video annotation. how to interprete this? thank you.

Oct 12 '21 07:10 Haebuk

do you already have an answer? I'm desparetely trying to make this file format work but I just don't undertsand

Nov 05 '21 11:11 DiegoLigtenberg

@DiegoLigtenberg Not yet :(

Nov 08 '21 00:11 Haebuk

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
	              relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
	              evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
	              others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
	              (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
	              (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
	              and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Nov 08 '21 01:11 Haebuk

@DiegoLigtenberg I found here,

 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding 
	              relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                      an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in 
	              evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), 
                      people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), 
	              others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
	              (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded 
	              (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), 
	              and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Do you have any idea how to convert it to yolov5 annotations?

Mar 14 '22 09:03 RoyCopter

@RoyCopter, you can write a simple script:

For each sequence (each txt file) : ----Load annotation file ----Extract unique frame_id (pd.unique or np.unique) ----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 ) ----Read and store image widght w and height h ----For each frame_id: --------Select only the relevant frame_id lines from the annotation file --------Divide bbox_center_x and bbox_width columns by w --------Divide bbox_center_y and bbox_height columns by h --------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

Mar 18 '22 09:03 saadhimmi

@RoyCopter, you can write a simple script:

For each sequence (each txt file) : ----Load annotation file ----Extract unique frame_id (pd.unique or np.unique) ----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 ) ----Read and store image widght w and height h ----For each frame_id: --------Select only the relevant frame_id lines from the annotation file --------Divide bbox_center_x and bbox_width columns by w --------Divide bbox_center_y and bbox_height columns by h --------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']

This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)

Thanks!

Apr 14 '22 16:04 RoyCopter

its available in Visdrone.yaml-

Jan 10 '23 11:01 ganesh0074

How should i do it if i want to display the bounding box and also the target's annotation IDs ?

Jun 06 '23 02:06 fatbringer

you need convert the given annotations into required format to het BBOX , there is function which converts annotations into correct one

Jun 06 '23 16:06 ganesh0074

@Ganyesh ooh where might i find the function? I havent been able to find it at all.

Im currently doing it myself by reading the text file line by line and assigning them as such

ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")

It seems that the annotation text files are different for each sub-dataset. How do we get around this? I am currently working on the visdrone MOT dataset

Jun 07 '23 01:06 fatbringer

ad

python train.py --data VisDrone.yaml --epochs 300 --weights '' --cfg yolov5n.yaml --batch-size 128

Regards, Ganesh Gulhane Mob. no.-7276158110

On Tue, Jun 6, 2023 at 6:44 PM ykn96 @.***> wrote:

@Ganyesh https://github.com/Ganyesh ooh where might i find the function? I havent been able to find it at all.

Im currently doing it myself by reading the text file line by line and assigning them as such

ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")

It seems that the annotation text files are different for each sub-dataset. How do we get around this? I am currently working on the visdrone MOT dataset

— Reply to this email directly, view it on GitHub https://github.com/VisDrone/VisDrone-Dataset/issues/22#issuecomment-1579730002, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EXN5MO2RRM6YWHSTKPNF3XJ7MJFANCNFSM5FZ4IHRQ . You are receiving this because you were mentioned.Message ID: @.***>

Jun 08 '23 10:06 ganesh0074

@fatbringer are you able to get inti it?

Jun 18 '23 18:06 ganesh0074

Hi @Ganyesh thanks for checking in Yes i have solved it. turns out the correct sequence is frame_no, ID, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu

Jun 19 '23 06:06 fatbringer

VisDrone-Dataset VisDrone-Dataset copied to clipboard

How to interprete annotation file values in Object Detection in Videos task?

VisDrone-Dataset
VisDrone-Dataset copied to clipboard