CMRNet icon indicating copy to clipboard operation
CMRNet copied to clipboard

Using a different data set for training and evaluation

Open RaviBeagle opened this issue 2 years ago • 4 comments

Hello @cattaneod ,

How would you recommend as the steps required to use a different data set for training and evaluating the CMRNet ? I understand that CMRNet is scene agnostic and I could in theory evaluate the pre-trained model against a new dataset.

Thanks

RaviBeagle avatar Jun 17 '22 06:06 RaviBeagle

Hi @RaviBeagle ,

although CMRNet is scene agnostic, it's not camera agnostic, which means that it can only be deployed using the same camera sensor as used during training. To overcome this issue, we proposed CMRNet++ which is scene and sensor agnostic, but the code is not publicly available at the moment.

In order to train CMRNet on a different dataset, the first and most important step is to check the quality of the ground truth poses. If the GT poses are not accurate, you probably won't be able to train CMRNet. Then, you should preprocess the dataset, by generating the map of every sequence, and generating a local map for each camera image, as done in https://github.com/cattaneod/CMRNet/blob/master/preprocess/kitti_maps.py. As mentioned in the README, the local maps should have this reference frame: X-forward, Y-right, Z-down.

After the dataset is preprocessed, you should adapt the data loader (https://github.com/cattaneod/CMRNet/blob/master/DatasetVisibilityKitti.py), make sure to change the camera calibration to your own calibration parameters (the images should also be undistorted beforehand).

Finally, you should change the image_size parameter to a suitable size for you images (take into account that both width and height should be multiple of 64, due to the architecture of the network).

cattaneod avatar Jun 17 '22 08:06 cattaneod

Thanks a lot for the hints. We are now preparing our vehicle setup to capture the dataset. The sensor setup is as follows:

  1. Velodyne HDL-16 as our LiDAR for Localization. Driving area has already the point cloud maps generated with same sensor.
  2. MYNTEYE S1030 Stereo camera. This camera provides grayscale images.

The plan is to run the ROS HDL localization package to generate the GT poses and capture the sensor data with approximate time synchronization. Do you see any limitations of our setup ?

RaviBeagle avatar Jul 21 '22 08:07 RaviBeagle

I'm not familiar with the ROS HDL package, but if the generated GT poses are accurate enough, I don't see any big limitations. CMRNet is intended for monocular localization, since you have a stereo setup, I can guess that you can further improve the localization performance by including the second camera.

cattaneod avatar Jul 28 '22 09:07 cattaneod

Hello @cattaneod , Due to delays in setting up our vehicle and sensors, we plan in the meantime to test out a synthetic dataset from CARLA (https://github.com/jedeschaud/kitti_carla_simulator) to train. With some modifications to the scripts we hope to have a dataset similar to KITTI. The CMRNet would not need any modifications in that case, if I am right ?

Thanks

RaviBeagle avatar Aug 16 '22 06:08 RaviBeagle

Hello @cattaneod ,

Have managed to generate KITTI type of dataset from the CARLA simulator. Here one question on a comment you put in the documentation:

"The Data Loader requires a local point cloud for each camera frame, the point cloud must be expressed with respect to the camera_2 reference frame, BUT (very important) with a different axes representation: X-forward, Y-right, Z-down."

Is this done by the preprocess/kitti_maps.py or is it something I have to do at the time of generation ?

RaviBeagle avatar Sep 09 '22 07:09 RaviBeagle

Yes, it is done in kitti_maps.py#L119

cattaneod avatar Sep 09 '22 07:09 cattaneod

Finally, you should change the image_size parameter to a suitable size for you images (take into account that both width and height should be multiple of 64, due to the architecture of the network).

I have set the RGB image size in CARLA as 1344x512 which is multiple of 64

Yet I get some error during training:

File "/home/sxv1kor/Temp/CMRNet/DatasetVisibilityKitti.py", line 180, in __getitem__
    img = self.custom_transform(img, img_rotation, h_mirror)
  File "/home/sxv1kor/Temp/CMRNet/DatasetVisibilityKitti.py", line 135, in custom_transform
    rgb = normalization(rgb)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 269, in forward
    return F.normalize(tensor, self.mean, self.std, self.inplace)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 360, in normalize
    return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torchvision/transforms/functional_tensor.py", line 959, in normalize
    tensor.sub_(mean).div_(std)
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

RaviBeagle avatar Sep 11 '22 15:09 RaviBeagle

As the error says, your images have 4 channels instead of 3 [R,G,B].

Sorry, I can't really help you with different datasets.

cattaneod avatar Sep 11 '22 16:09 cattaneod

Yes, indeed that was the problem. Thanks. I have saved the images as RGB and now the training is happening.

RaviBeagle avatar Sep 15 '22 15:09 RaviBeagle

Yes, it is done in kitti_maps.py#L119

Hi @cattaneod , Thanks for your amazing job. I wonder how you fetch the surrounding point cloud that forms the local submap? Specifically, I mean here. The code confuses me quite a lot.

whu-lyh avatar May 23 '23 13:05 whu-lyh

Hi @cattaneod , Thanks for your amazing job. I wonder how you fetch the surrounding point cloud that forms the local submap? Specifically, I mean here. The code confuses me quite a lot.

What exactly is not clear? To generate a local submap around the camera pose, I first transform the global map into a local map, such that the camera pose is the origin of the point cloud

local_map = torch.mm(pose, local_map).t()

Then, I crop the point cloud around the origin (which now is the camera pose). Specifically, i take 25 meters on the left and right, 100 meters in the front, and -10 in the back, to account for the random initial position H_init that could be behind the real pose.

indexes = local_map[:, 1] > -25.  # Crop 25 meters to the right
indexes = indexes & (local_map[:, 1] < 25.)  # Crop 25 meters to the left
indexes = indexes & (local_map[:, 0] > -10.)  # Crop 10 meters to the back
indexes = indexes & (local_map[:, 0] < 100.)  # Crop 100 meters to the front

cattaneod avatar May 25 '23 10:05 cattaneod

Ohhhhh, I figure it out. Thanks very much!

whu-lyh avatar May 26 '23 12:05 whu-lyh