kitti_ros copied to clipboard
[Evaluation][3D Object] 3D object detection performance evaluation
- [ ] 实现成一个evaluation ROS node
- 参考KiTTI 3D object detection performance evaluation 的方法,采用 AP (Average Precision) 定量分析
#Values Name Description
3 dimensions 3D object dimensions: height, width, length (in meters)
3 location 3D object location x,y,z in camera coordinates (in meters)
1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
- Let us understand how all this makes sense. Lets say you built an algorithm that is extremely efficient at detecting the back of the cars. Such an algorithm has near 100% precision because all cars detected are present in the image. But this algorithm is really bad at detecting cars with side-view and should be penalized. This is where recall comes in. We should evaluate not just the overall precision but instead precision for different recall values. In the example we considered, the algorithm has low recall because it fails to detect all cars. Lets say the recall r=0.2 in this example. Now consider the equation for AP. The first two bins r=0, 0.1 give a near 100% interpolated precision in this example but the bins following r=0.2 have very low interpolated precision. Averaging all the recall bins, we find that AP is low in this example. To summarize, our model has to give a good precision for all recall values to get a high AP score.
AP 计算公式
vector<double> getThresholds(vector<double>& v, double n_groundtruth)
// holds scores needed to compute N_SAMPLE_PTS recall values
vector<double> t;
// sort scores in descending order
// (highest score is assumed to give best/most confident detections)
sort(v.begin(), v.end(), greater<double>());
// get scores for linearly spaced recall
double current_recall = 0;
for (int32_t i = 0; i < v.size(); i++) {
// check if right-hand-side recall with respect to current recall is close than left-hand-side one
// in this case, skip the current detection score
double l_recall, r_recall, recall;
l_recall = (double) (i + 1) / n_groundtruth;
if (i < (v.size() - 1))
r_recall = (double) (i + 2) / n_groundtruth;
r_recall = l_recall;
if ((r_recall - current_recall) < (current_recall - l_recall) && i < (v.size() - 1))
// left recall is the best approximation, so use this and goto next recall step for approximation
recall = l_recall;
// the next recall step was reached
current_recall += 1.0 / (N_SAMPLE_PTS - 1.0);
return t;
// get scores that must be evaluated for recall discretization
thresholds = getThresholds(v, n_gt);
// iterate on every frame of data
for (int32_t i = 0; i < groundtruth.size(); i++) {
// for all scores/recall thresholds do:
for (int32_t t = 0; t < thresholds.size(); t++) {
tPrData tmp = tPrData();
tmp = computeStatistics(current_class, groundtruth[i], detections[i], dontcare[i],
ignored_gt[i], ignored_det[i], true, boxoverlap, metric,
compute_aos, thresholds[t], t == 38);
// add no. of TP, FP, FN, AOS for current frame to total evaluation for current threshold
pr[t].tp +=;
pr[t].fp += tmp.fp;
pr[t].fn += tmp.fn;
if (tmp.similarity != -1)
pr[t].similarity += tmp.similarity;
// compute recall, precision and AOS
vector<double> recall;
precision.assign(N_SAMPLE_PTS, 0);
if (compute_aos)
aos.assign(N_SAMPLE_PTS, 0);
double r = 0;
for (int32_t i = 0; i < thresholds.size(); i++) {
r = pr[i].tp / (double) (pr[i].tp + pr[i].fn);
precision[i] = pr[i].tp / (double) (pr[i].tp + pr[i].fp);
if (compute_aos)
aos[i] = pr[i].similarity / (double) (pr[i].tp + pr[i].fp);
// filter precision and AOS using max_{i..end}(precision)
for (int32_t i = 0; i < thresholds.size(); i++) {
precision[i] = *max_element(precision.begin() + i, precision.end());
if (compute_aos)
aos[i] = *max_element(aos.begin() + i, aos.end());
借助Ground Truth's IoU Overlap计算TP、FP
we require an 3D bounding box overlap of 70%, while forpedestrians
we require a 3D bounding box overlap of 50%.
- 我们统一采用50%判定TP、FP
- max_{i..end}(precision) 并计算均值