PETR
PETR copied to clipboard
model trained on one set of extrinsic be applied to another setup
Hi, Great work, lots of learning after started using it.
I want to understand the significance of extrinsic matrix (lidar2cam) on model training. As 3d Points (frustum) and image features are passed to MLP for generating position embedding (PE),
how would any change in extrinsic (ex. different vehicle) affect the model performance?
Can a model trained on one set of extrinsic be applied to another setup by passing appropriate frustum while inference?
If training is coupled with extrinsic setup, any thoughts on generalizing for different extrinsic setups?
Hi,
In the training process, we will randomly rotate extrinsic (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L161) to enhance the generalization performance. Therefore, in theory, it can be used for different extrinsic. Different frustum will be generated when different extrinsic is input.
However, in practical application, the performance degradation is still relatively large. It is due to the different extrinsic, and other reasons:
- domain gap, such as different data scenes and FOV of the camera.
- 2D PE is adopted in the original PETR (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L48). We use a multi-view 2D PE. It has little improvement on performance, and it will also affect the speed and generalization performance. We removed it in subsequent versions (StreamPETR).
Hi @yingfei1016,
i tried to make it up with LYFT dataset, by changing in the config as follow below
- wrote my own dataloader to load images and camera matrices
- was trying to run for half the resolution rather on full, so changed in aug part
-
"final_dim": (640, 960), "H": 1280, "W": 1920,
Attached is my yml and sample output from petr only for detection. but looks like there are shift and scale in the detected boxes. Please kindly correct me where am I doing wrong
thanking you,