PETR icon indicating copy to clipboard operation
PETR copied to clipboard

model trained on one set of extrinsic be applied to another setup

Open mdfaheem786 opened this issue 1 year ago • 2 comments

Hi, Great work, lots of learning after started using it.

I want to understand the significance of extrinsic matrix (lidar2cam) on model training. As 3d Points (frustum) and image features are passed to MLP for generating position embedding (PE),

how would any change in extrinsic (ex. different vehicle) affect the model performance?
Can a model trained on one set of extrinsic be applied to another setup by passing appropriate frustum while inference?
If training is coupled with extrinsic setup, any thoughts on generalizing for different extrinsic setups?

mdfaheem786 avatar Mar 02 '23 01:03 mdfaheem786

Hi,

In the training process, we will randomly rotate extrinsic (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L161) to enhance the generalization performance. Therefore, in theory, it can be used for different extrinsic. Different frustum will be generated when different extrinsic is input.

However, in practical application, the performance degradation is still relatively large. It is due to the different extrinsic, and other reasons:

  1. domain gap, such as different data scenes and FOV of the camera.
  2. 2D PE is adopted in the original PETR (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L48). We use a multi-view 2D PE. It has little improvement on performance, and it will also affect the speed and generalization performance. We removed it in subsequent versions (StreamPETR).

yingfei1016 avatar Mar 02 '23 03:03 yingfei1016

Hi @yingfei1016,

i tried to make it up with LYFT dataset, by changing in the config as follow below

  • wrote my own dataloader to load images and camera matrices
  • was trying to run for half the resolution rather on full, so changed in aug part
  •   "final_dim": (640, 960),
      "H": 1280,
      "W": 1920,
    

Attached is my yml and sample output from petr only for detection. but looks like there are shift and scale in the detected boxes. Please kindly correct me where am I doing wrong

thanking you, __CAM_FRONT_LEFT__host-a101_cam5_1241893240033330006_pred

petrv2_BEVseg_lyf_py.txt

mdfaheem786 avatar Mar 12 '23 10:03 mdfaheem786