mmdetection3d
mmdetection3d copied to clipboard
Understanding tracklet_labels.xml to KITTI label format conversion properly
As my (own) dataset keeps getting 0 mAP, I'm curious if I understood the tracklet_labels.xml
information properly in order to create a label file for each point cloud frame in my training set. I'm using only point cloud data, that's why I can't use any calibration or camera data.
The dimension information are in camera coordinates, so it's necessary to convert them to velodyne coordinates which is done here in KittiDataset
:
https://github.com/open-mmlab/mmdetection3d/blob/c8347b7ed933d70fcfbfb73a3541046b8c8e8f5e/mmdet3d/datasets/kitti_dataset.py#L196
Assumming that I do not use camera or calibration data, is it correct to replace the conversion matrix rect @ Trv2c
with np.identity(4)
? The line looks the following in my dataset:
gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d).convert_to(self.box_mode_3d, np.identity(4))
I'm using CVAT, to label my point cloud data. Having, e.g. point cloud data frames (000000.bin and 000001.bin) with each containing one object from the Pedestrian class, the tracklet_labels.xml
looks the following:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE boost_serialization>
<boost_serialization version="9" signature="serialization::archive">
<tracklets version="0" tracking_level="0" class_id="0">
<count>2</count>
<item_version>1</item_version>
<item version="1" tracking_level="0" class_id="1">
<objectType>Pedestrian</objectType>
<h>0.48</h>
<w>0.47</w>
<l>1.98</l>
<first_frame>1</first_frame>
<poses version="0" tracking_level="0" class_id="2">
<count>1</count>
<item_version>0</item_version>
<item version="1" tracking_level="0" class_id="3">
<tx>1.03</tx>
<ty>-2.85</ty>
<tz>-1.47</tz>
<rx>0.0</rx>
<ry>0.0</ry>
<rz>0.0</rz>
<state>2</state>
<occlusion>0</occlusion>
<occlusion_kf>0</occlusion_kf>
<truncation>0</truncation>
<amt_occlusion>-1</amt_occlusion>
<amt_border_l>-1</amt_border_l>
<amt_border_r>-1</amt_border_r>
<amt_occlusion_kf>-1</amt_occlusion_kf>
<amt_border_kf>-1</amt_border_kf>
</item>
</poses>
<finished>1</finished>
</item>
<item>
<objectType>Pedestrian</objectType>
<h>0.46</h>
<w>0.53</w>
<l>1.93</l>
<first_frame>0</first_frame>
<poses>
<count>1</count>
<item_version>0</item_version>
<item>
<tx>1.0</tx>
<ty>-2.83</ty>
<tz>-1.5</tz>
<rx>0.0</rx>
<ry>0.0</ry>
<rz>0.0</rz>
<state>2</state>
<occlusion>0</occlusion>
<occlusion_kf>0</occlusion_kf>
<truncation>0</truncation>
<amt_occlusion>-1</amt_occlusion>
<amt_border_l>-1</amt_border_l>
<amt_border_r>-1</amt_border_r>
<amt_occlusion_kf>-1</amt_occlusion_kf>
<amt_border_kf>-1</amt_border_kf>
</item>
</poses>
<finished>1</finished>
</item>
</tracklets>
</boost_serialization>
Thus, I interpreted the tags as the following (compared to KITTI format):
- type = <objectType>
- truncated = <truncation>
- occluded = <occlusion>
- alpha = N/A → defaults to -10 (?)
- bbox = N/A → defaults to 0.0 0.0 0.0 0.0
- dimensions = <h> <w> <l>
- location = <tx> <ty> <tz>
- rotation_y = <ry>
- score = N/A
Does the <first_frame>
tag indicates the frame index? What does the <poses>
tag mean here? In comparison to the KITTI test file from the raw data development kit, I only receive 1 pose per frame.
With that being said, for instance, the first point cloud data frame will receive the following content (for 000000.txt):
Pedestrian 0.0 0 0.0 0.0 0.0 0.0 0.0 0.46 0.53 1.93 1.0 -2.83 -1.5 0.0
Is that correct conversion to the label file? Many thanks in advance!