VisDrone-Dataset
VisDrone-Dataset copied to clipboard
How to interprete annotation file values in Object Detection in Videos task?
I downloaded Task 2 dataset and unzipped it, then i got the annotation files and the format like below:
1,0,593,43,174,190,0,0,0,0
2,0,592,43,174,189,0,0,0,0
3,0,592,43,174,189,0,0,0,0
4,0,592,43,174,189,0,0,0,0
5,0,592,43,174,189,0,0,0,0
...
I found below description,
<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>
Name Description
-------------------------------------------------------------------------------------------------------------------------------
<bbox_left> The x coordinate of the top-left corner of the predicted bounding box
<bbox_top> The y coordinate of the top-left corner of the predicted object bounding box
<bbox_width> The width in pixels of the predicted object bounding box
<bbox_height> The height in pixels of the predicted object bounding box
<score> The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing
an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation,
while 0 indicates the bounding box will be ignored.
<object_category> The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1),
people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),
others(11))
<truncation> The score in the DETECTION result file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame
(i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
<occlusion> The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0
(occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2
(occlusion ratio 50% ~ 100%)).
But I think the description is quite different video annotation. how to interprete this? thank you.
do you already have an answer? I'm desparetely trying to make this file format work but I just don't undertsand
@DiegoLigtenberg Not yet :(
@DiegoLigtenberg I found here,
<frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>
Name Description
----------------------------------------------------------------------------------------------------------------------------------
<frame_index> The frame index of the video frame
<target_id> In the DETECTION result file, the identity of the target should be set to the constant -1.
In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding
relation of the bounding boxes in different frames.
<bbox_left> The x coordinate of the top-left corner of the predicted bounding box
<bbox_top> The y coordinate of the top-left corner of the predicted object bounding box
<bbox_width> The width in pixels of the predicted object bounding box
<bbox_height> The height in pixels of the predicted object bounding box
<score> The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing
an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in
evaluation, while 0 indicates the bounding box will be ignored.
<object_category> The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1),
people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10),
others (11))
<truncation> The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame
(i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).
<occlusion> The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded
(i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%),
and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).
@DiegoLigtenberg I found here,
<frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion> Name Description ---------------------------------------------------------------------------------------------------------------------------------- <frame_index> The frame index of the video frame <target_id> In the DETECTION result file, the identity of the target should be set to the constant -1. In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding relation of the bounding boxes in different frames. <bbox_left> The x coordinate of the top-left corner of the predicted bounding box <bbox_top> The y coordinate of the top-left corner of the predicted object bounding box <bbox_width> The width in pixels of the predicted object bounding box <bbox_height> The height in pixels of the predicted object bounding box <score> The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing an object instance. The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored. <object_category> The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11)) <truncation> The score in the DETECTION file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)). <occlusion> The score in the DETECTION file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).
Do you have any idea how to convert it to yolov5 annotations?
@RoyCopter, you can write a simple script:
For each sequence (each txt file) : ----Load annotation file ----Extract unique frame_id (
pd.unique
ornp.unique
) ----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 ) ----Read and store image widghtw
and heighth
----For each frame_id: --------Select only the relevant frame_id lines from the annotation file --------Divide bbox_center_x and bbox_width columns byw
--------Divide bbox_center_y and bbox_height columns byh
--------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']
This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)
@RoyCopter, you can write a simple script:
For each sequence (each txt file) : ----Load annotation file ----Extract unique frame_id (
pd.unique
ornp.unique
) ----Create bbox_center_x and bbox_center_y columns (e.g. bbox_center_x = (bbox_left + bbox_width)/2 ) ----Read and store image widghtw
and heighth
----For each frame_id: --------Select only the relevant frame_id lines from the annotation file --------Divide bbox_center_x and bbox_width columns byw
--------Divide bbox_center_y and bbox_height columns byh
--------Save a txt file with ['object_category','bbox_center_x', 'bbox_center_y', 'bbox_w', 'bbox_h']This is just a simple example that completely ignores the truncation and occlusion information. You could use these columns to further process the annotations you want to keep (or mark heavy occlusions as 'ignored' class)
Thanks!
its available in Visdrone.yaml-
How should i do it if i want to display the bounding box and also the target's annotation IDs ?
you need convert the given annotations into required format to het BBOX , there is function which converts annotations into correct one
@Ganyesh ooh where might i find the function? I havent been able to find it at all.
Im currently doing it myself by reading the text file line by line and assigning them as such
ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")
It seems that the annotation text files are different for each sub-dataset. How do we get around this? I am currently working on the visdrone MOT dataset
ad
python train.py --data VisDrone.yaml --epochs 300 --weights '' --cfg yolov5n.yaml --batch-size 128
Regards, Ganesh Gulhane Mob. no.-7276158110
On Tue, Jun 6, 2023 at 6:44 PM ykn96 @.***> wrote:
@Ganyesh https://github.com/Ganyesh ooh where might i find the function? I havent been able to find it at all.
Im currently doing it myself by reading the text file line by line and assigning them as such
ID, frame_no, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu = line.split(",")
It seems that the annotation text files are different for each sub-dataset. How do we get around this? I am currently working on the visdrone MOT dataset
— Reply to this email directly, view it on GitHub https://github.com/VisDrone/VisDrone-Dataset/issues/22#issuecomment-1579730002, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EXN5MO2RRM6YWHSTKPNF3XJ7MJFANCNFSM5FZ4IHRQ . You are receiving this because you were mentioned.Message ID: @.***>
@fatbringer are you able to get inti it?
Hi @Ganyesh thanks for checking in
Yes i have solved it. turns out the correct sequence is
frame_no, ID, bbox_x, bbox_y, bbox_w, bbox_h, score, obj_class , trunc , occlu