Ultra-Fast-Lane-Detection-v2 icon indicating copy to clipboard operation
Ultra-Fast-Lane-Detection-v2 copied to clipboard

What's the meaning of the value

Open Durobert opened this issue 3 years ago • 5 comments

out_tmp_left = (loc_row_left[batch_idx,all_ind_left,cls_idx,lane_idx].softmax(0) * all_ind_left.float()).sum() + 0.5 
out_tmp_left = out_tmp_left / (num_grid-1) * 1640 + 1640./25

out_tmp_up = (loc_col_up[batch_idx,all_ind_up,cls_idx,lane_idx].softmax(0) * all_ind_up.float()).sum() + 0.5 
out_tmp_up = out_tmp_up / (num_grid-1) * 590 + 32./534*590

what's the meaning of the value 1640./25 and 32./534*590, if I change the dataset,how to set the value?

Durobert avatar Aug 04 '22 06:08 Durobert

@Durobert The values of 1640./25 and 32./534*590 are the offset of test-time-augmentation (TTA).

During TTA, we would first shift the image, and get the prediction of the shifted image. Then we inverse-shift the prediction of the shifted image to get the correct prediction. In this way, a TTA is finished.

For example, if we shift the image to the left for x pixels, then the predicted coordinates should add x as well. The difference is that we shift the image in the strided feature map. If the feature map's width is 25, then we shift the feature by 1 pixel means image width * 1 / 25 pixels in the original image space, which is the derivation of 1640./25 (1640 is the image width on CULane).

The values of 32./534*590 is similar, but this part contains a crop operation.

cfzd avatar Aug 04 '22 08:08 cfzd

@cfzd For the CULane, you resize the image to 1600*320, I use the backbone resnet18, The downsampling multiple is 32,so the feature map's width is 1600/32=50,the value is 1640./50, is right?

Durobert avatar Aug 05 '22 06:08 Durobert

@Durobert It should be correct. In fact, another interesting point is that: if you always do TTA both in the opposite directions with the same shift, you can directly average the shifted predictions together without offset and get the correct results. Since (pred - offset) + (pred + offset) = 2*pred.

cfzd avatar Aug 05 '22 06:08 cfzd

@cfzd Another problem, about the value 32./534 * 590,For the CULane,the crop_ratio is 0.6,so the resized image height is 320/0.6=534, the value 32./534 * 590 means the croped image height is 32, is right?If I don't crop, the value is 0?

Durobert avatar Aug 05 '22 07:08 Durobert

@Durobert The offset with the crop operation is a little tricky, and sorry I have forgotten the derivation details. However, the core idea is the same, and it is just to make sure the shift prediction is correct.

If you don't crop, it is the same as the logic of 1640./25. For example, suppose the height of the feature map is hf, the height of original image is hi, the number of shifted pixels is x, then the offset is: x/hf * hi.

cfzd avatar Aug 05 '22 07:08 cfzd