mmdetection3d icon indicating copy to clipboard operation
mmdetection3d copied to clipboard

Understanding tracklet_labels.xml to KITTI label format conversion properly

Open holtvogt opened this issue 2 years ago • 0 comments

As my (own) dataset keeps getting 0 mAP, I'm curious if I understood the tracklet_labels.xml information properly in order to create a label file for each point cloud frame in my training set. I'm using only point cloud data, that's why I can't use any calibration or camera data.

The dimension information are in camera coordinates, so it's necessary to convert them to velodyne coordinates which is done here in KittiDataset: https://github.com/open-mmlab/mmdetection3d/blob/c8347b7ed933d70fcfbfb73a3541046b8c8e8f5e/mmdet3d/datasets/kitti_dataset.py#L196 Assumming that I do not use camera or calibration data, is it correct to replace the conversion matrix rect @ Trv2c with np.identity(4)? The line looks the following in my dataset:

gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d).convert_to(self.box_mode_3d, np.identity(4))

I'm using CVAT, to label my point cloud data. Having, e.g. point cloud data frames (000000.bin and 000001.bin) with each containing one object from the Pedestrian class, the tracklet_labels.xml looks the following:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE boost_serialization>
<boost_serialization version="9" signature="serialization::archive">
<tracklets version="0" tracking_level="0" class_id="0">
  <count>2</count>
  <item_version>1</item_version>
  <item version="1" tracking_level="0" class_id="1">
    <objectType>Pedestrian</objectType>
    <h>0.48</h>
    <w>0.47</w>
    <l>1.98</l>
    <first_frame>1</first_frame>
    <poses version="0" tracking_level="0" class_id="2">
      <count>1</count>
      <item_version>0</item_version>
      <item version="1" tracking_level="0" class_id="3">
        <tx>1.03</tx>
        <ty>-2.85</ty>
        <tz>-1.47</tz>
        <rx>0.0</rx>
        <ry>0.0</ry>
        <rz>0.0</rz>
        <state>2</state>
        <occlusion>0</occlusion>
        <occlusion_kf>0</occlusion_kf>
        <truncation>0</truncation>
        <amt_occlusion>-1</amt_occlusion>
        <amt_border_l>-1</amt_border_l>
        <amt_border_r>-1</amt_border_r>
        <amt_occlusion_kf>-1</amt_occlusion_kf>
        <amt_border_kf>-1</amt_border_kf>
      </item>
    </poses>
    <finished>1</finished>
  </item>
  <item>
    <objectType>Pedestrian</objectType>
    <h>0.46</h>
    <w>0.53</w>
    <l>1.93</l>
    <first_frame>0</first_frame>
    <poses>
      <count>1</count>
      <item_version>0</item_version>
      <item>
        <tx>1.0</tx>
        <ty>-2.83</ty>
        <tz>-1.5</tz>
        <rx>0.0</rx>
        <ry>0.0</ry>
        <rz>0.0</rz>
        <state>2</state>
        <occlusion>0</occlusion>
        <occlusion_kf>0</occlusion_kf>
        <truncation>0</truncation>
        <amt_occlusion>-1</amt_occlusion>
        <amt_border_l>-1</amt_border_l>
        <amt_border_r>-1</amt_border_r>
        <amt_occlusion_kf>-1</amt_occlusion_kf>
        <amt_border_kf>-1</amt_border_kf>
      </item>
    </poses>
    <finished>1</finished>
  </item>
</tracklets>
</boost_serialization>

Thus, I interpreted the tags as the following (compared to KITTI format):

  • type = <objectType>
  • truncated = <truncation>
  • occluded = <occlusion>
  • alpha = N/A → defaults to -10 (?)
  • bbox = N/A → defaults to 0.0 0.0 0.0 0.0
  • dimensions = <h> <w> <l>
  • location = <tx> <ty> <tz>
  • rotation_y = <ry>
  • score = N/A

Does the <first_frame> tag indicates the frame index? What does the <poses> tag mean here? In comparison to the KITTI test file from the raw data development kit, I only receive 1 pose per frame.

With that being said, for instance, the first point cloud data frame will receive the following content (for 000000.txt):

Pedestrian 0.0 0 0.0 0.0 0.0 0.0 0.0 0.46 0.53 1.93 1.0 -2.83 -1.5 0.0

Is that correct conversion to the label file? Many thanks in advance!

holtvogt avatar Aug 07 '22 15:08 holtvogt