RTM3D icon indicating copy to clipboard operation
RTM3D copied to clipboard

About the robustness and portability of monocular 3D models

Open WSTao opened this issue 4 years ago • 14 comments

Monocular 3D depends on camera parameters. If you change a different camera or installation method, the original DataSet training model will not work. So how can you solve this difference

WSTao avatar Jan 11 '21 09:01 WSTao

You just need to make sure the calib matrix is formatted correctly, and the parameters can vary from camera to camera. We have done verification on the nuscenes data set to prove that this is work. We used Model Zoo's DLA34 model (trained from only the kitti data set) to get the results without changing any parameters.

Banconxuan avatar Jan 11 '21 09:01 Banconxuan

1_image 1_bev

Banconxuan avatar Jan 11 '21 09:01 Banconxuan

We format the calib of nuscnes as: P0: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 P1: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 P2: 1.252813102119e+03 0.000000000000e+00 8.265881147814e+02 0.000000000000e+00 0.000000000000e+00 1.252813102119e+03 4.699846626225e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 P3: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 R0_rect: 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 Tr_velo_to_cam: 1.122025939680e-02 -9.998986137987e-01 -8.767434198194e-03 -7.022340992421e-03 5.464515701519e-02 9.368031550067e-03 -9.984618905094e-01 -3.515059821513e-01 9.984427938514e-01 1.072390359095e-02 5.474472849433e-02 -7.332408994883e-01 Tr_imu_to_velo: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00

Banconxuan avatar Jan 11 '21 09:01 Banconxuan

Thank you very much In other words, if you use different cameras or different installation locations(at this time, the internal and external parameters of the camera change), you need to remake the training data set?

WSTao avatar Jan 11 '21 09:01 WSTao

YES. For now, you can only align Kitti's format.

Banconxuan avatar Jan 11 '21 10:01 Banconxuan

Ok, thanks!

WSTao avatar Jan 12 '21 01:01 WSTao

You just need to make sure the calib matrix is formatted correctly, and the parameters can vary from camera to camera. We have done verification on the nuscenes data set to prove that this is work. We used Model Zoo's DLA34 model (trained from only the kitti data set) to get the results without changing any parameters.

In KM3D , you reformulate the geometric constraints as a differentiable version and used for training . I wonder if KM3D is overfitted easily in train data 's camera parameters . But , It seems look work well in nuscenes data. Comparing to RTM3D’s ,I wonder if the generalization of KM3D is poor in other datasets, Do you make a comparison?

cch2016 avatar Jan 17 '21 11:01 cch2016

@cch2016 I have tried to run the pretrained model on my own camera images:

Screenshot from 2021-02-10 15-14-10

The cars can only be detected in close range (10-20m) because the image is cropped internally:

image

I have used the calibration data (projection matrix) from the KITTI dataset (calib/000000.txt).

@Banconxuan Did you use the calibration data (projection matrix) from KITTI or from NuScenes when doing inference on this image?

image

walzimmer avatar Feb 10 '21 14:02 walzimmer

@walzimmer You should use projection matrix from your own dataset. It's generalization is pretty good.

cch2016 avatar Feb 24 '21 03:02 cch2016

Hi, @walzimmer did you successfully get the intended result on your custom dataset? I am currently working on my custom dataset, with cameras from higher angle. And, @cch2016 @Banconxuan do I need to corp my images into the same size as Kitti dataset images? The calib parameters that I need to change, are P2, R0_rect and Tr_velo_to_cam, and I should set other parameters to zero . Is that correct?

a43992899 avatar Mar 09 '21 06:03 a43992899

So , If I understand correctly all you need to change is P2. If you look at the run code for testing only P2 is being read

athus1990 avatar Apr 27 '21 22:04 athus1990

Hello, guys! I am also doing some work on the generalization of mono3d methods. And I wonder why the network can learn the robostness of camera intrinsics. The depth of instances will vary from different cameras and different datasets based on the camrea intrinsics. So I think it will fail on the depth estimation(other 3d box attributes may be good).

gujiaqivadin avatar May 27 '21 03:05 gujiaqivadin

Brothers. I used some pictures in Suscenes for inferencing, and modified P2 (the intrinsic parameters of camera). The results show that the model can identify the object well, but the position seems to have a large deviation. Is it impossible to infer the positions of objects with different camera?

By looking at the code, I can see that the network directly outputs the location of each target. Don't the model need intrinsic parameters to get the position of object from a picture? In this way, how can model obtain position of object from pictures taken by different cameras?

If I use camera with different focal length,can the model infer accurate position of object?

If you know, I hope you will give me some advice. Thank you very much!

image image

KinkangLiu avatar Oct 30 '21 04:10 KinkangLiu

Could you please offer some details about how to train KM3D on Nuscenes dataset to obtain the result in the paper(AP=15.3)? Thank you.

Ocean-627 avatar Sep 07 '22 03:09 Ocean-627