bevfusion When using single image (e.g. KITTI)

Hi, thank you for your great work!

I'm trying to learn BEVFusion on a KITT dataset.

Since using a single image, I have used 5 images as the zero array. Accordingly, I have set the corresponding calibration matrix to the identity matrix.

This setup worked successfully in BEVFusion with the codebase version of MMDetection3D.

However, after modifying the code based on this repository, I observed a drop in the fusion model's performance relative to the LiDAR model's performance.

To date, I have not been able to identify a clear reason.

In my opinion, the difference between my current code and the implementation in MMDetection3D is the order of the images. (Of course, there could be other differences).

In MMDetection3D, I put the single image in the third (because I was using CAM2), whereas here I put it in the first.

Can this have a significant impact on the results? Or can you speculate on another cause?

Sep 01 '23 08:09 san9569

When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance， have you found the reason??

Sep 07 '23 05:09 YangChen1234567

Hello, I am also trying to learn BEVFusion on KITTI dataset but I keep getting errors, can you provide me some help, I will be very thankful!

Sep 26 '23 07:09 Lcl159

When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too， have you found the reason??

No, I'm using model from mmdetection3d repo.

Sep 26 '23 07:09 san9569

Hello, I am also trying to learn BEVFusion on KITTI dataset but I keep getting errors, can you provide me some help, I will be very thankful!

Hi, to train BEVFusion on KITTI requires some bunches of modification of the codes and it's not trivial. You have to consider some factors (such as number of input images, number of outputs except for velocity, etc.).

If you show me the error, I could give you some advice for you.

Sep 26 '23 07:09 san9569

When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too， have you found the reason??

No, I'm using model from mmdetection3d repo.

so after using model from mmdet3d repo, the fusion model's performance improved？？？？

Sep 26 '23 07:09 YangChen1234567

When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too， have you found the reason??

No, I'm using model from mmdetection3d repo.

so after using model from mmdet3d repo, the fusion model's performance improved？？？？

Yep, I could get a reasonable performance improvement.

Sep 26 '23 07:09 san9569

When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too， have you found the reason??

No, I'm using model from mmdetection3d repo.

so after using model from mmdet3d repo, the fusion model's performance improved？？？？

Yep, I could get a reasonable performance improvement.

Nice! And I have another question, the image size of kitti datasets(1392 x 512) is different from nuscenes(1280 x 720)，how to set the resize parameter in data augmentation???

Sep 26 '23 07:09 YangChen1234567

Hello, I am also trying to learn BEVFusion on KITTI dataset but I keep getting errors, can you provide me some help, I will be very thankful!

Hi, to train BEVFusion on KITTI requires some bunches of modification of the codes and it's not trivial. You have to consider some factors (such as number of input images, number of outputs except for velocity, etc.).

If you show me the error, I could give you some advice for you.

Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2

Sep 26 '23 07:09 Lcl159

Nice! And I have another question, the image size of kitti datasets(1392 x 512) is different from nuscenes(1280 x 720)，how to set the resize parameter in data augmentation???

First, please notice that the configuration file could be different from here because I use the mmdetection3d repo.

In my case, all input image is resized to (256, 704) which is input size of 2D model.

Sep 26 '23 07:09 san9569

Nice! And I have another question, the image size of kitti datasets(1392 x 512) is different from nuscenes(1280 x 720)，how to set the resize parameter in data augmentation???

First, please notice that the configuration file could be different from here because I use the mmdetection3d repo.

In my case, all input image is resized to (256, 704) which is input size of 2D model.

OK...... Thanks so much~

Sep 26 '23 07:09 YangChen1234567

Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2

In my memory, cur_img_aug_matrix should have the shape of (N, 4, 4) where N is the number of input images. With KITTI dataset, I put the dummy zero array to it except for the real one because I used a single image. Please check the shape of cur_img_aug_matrix.

Sep 26 '23 07:09 san9569

Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2

In my memory, should have the shape of (N, 4, 4) where N is the number of input images. With KITTI dataset, I put the dummy zero array to it except for the real one because I used a single image. Please check the shape of .cur_img_aug_matrix``cur_img_aug_matrix

Yes, but I haven't been able to find it, if it's convenient, could you provide me with the relevant code, I'd appreciate it!

Sep 26 '23 07:09 Lcl159

Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2

In my memory, cur_img_aug_matrix should have the shape of (N, 4, 4) where N is the number of input images. With KITTI dataset, I put the dummy zero array to it except for the real one because I used a single image. Please check the shape of cur_img_aug_matrix.

I will check it, thank you very much~

Sep 26 '23 07:09 YangChen1234567

@Lcl159, oops, sorry I was confused about it. I do not modify the img_aug_matrix.

If the input images whose shape is (B, N, C, H, W) are processed, the augmentation matrix is assigned in here.

So, if you correctly set the input images, cur_img_aug_matrix should be loaded. Please print the shape of it.

Sep 26 '23 08:09 san9569

@Lcl159, oops, sorry I was confused about it. I do not modify the img_aug_matrix.

If the input images whose shape is (B, N, C, H, W) are processed, the augmentation matrix is assigned in here.

So, if you correctly set the input images, cur_img_aug_matrix should be loaded. Please print the shape of it.

I'm thinking that I'm modifying the config file, or the loading file, but I can't fix it, can you tell me what you've changed?

Sep 26 '23 08:09 Lcl159

@Lcl159, Have you modified the LoadMultiViewImageFromFiles in here?

In here, you have to put the dummy array.

For example, my modification in mmdetection3d codebase is as follows.

        filename, cam2img, lidar2cam, cam2lidar, lidar2img = [], [], [], [], []
        for _, cam_item in results["images"].items():
            if isinstance(cam_item, dict):
                if "img_path" in cam_item.keys():
                    filename.append(cam_item["img_path"])
                    lidar2cam.append(cam_item["lidar2cam"])

                    lidar2cam_array = np.array(cam_item["lidar2cam"]).astype(
                        np.float32
                    )
                    lidar2cam_rot = lidar2cam_array[:3, :3]
                    lidar2cam_trans = lidar2cam_array[:3, 3:4]
                    camera2lidar = np.eye(4)
                    camera2lidar[:3, :3] = lidar2cam_rot.T
                    camera2lidar[:3, 3:4] = -1 * np.matmul(
                        lidar2cam_rot.T, lidar2cam_trans.reshape(3, 1)
                    )
                    cam2lidar.append(camera2lidar)

                    cam2img_array = np.eye(4).astype(np.float32)
                    cam2img_array[:3, :3] = np.array(
                        cam_item["cam2img"]
                    ).astype(np.float32)[:3, :3]
                    cam2img.append(cam2img_array)
                    lidar2img.append(cam2img_array @ lidar2cam_array)
                else:
                    filename.append("zero.png")
                    lidar2cam.append(np.eye(4))
                    cam2lidar.append(np.eye(4))
                    cam2img.append(np.eye(4))
                    lidar2img.append(np.eye(4))
            else:
                filename.append("zero.png")
                lidar2cam.append(np.eye(4))
                cam2lidar.append(np.eye(4))
                cam2img.append(np.eye(4))
                lidar2img.append(np.eye(4))

        results["img_path"] = filename
        results["cam2img"] = np.stack(cam2img, axis=0)
        results["lidar2cam"] = np.stack(lidar2cam, axis=0)
        results["cam2lidar"] = np.stack(cam2lidar, axis=0)
        results["lidar2img"] = np.stack(lidar2img, axis=0)

        results["ori_cam2img"] = copy.deepcopy(results["cam2img"])

        # img is of shape (h, w, c, num_views)
        # h and w can be different for different views
        imgs = []
        for name in filename:
            if not name == "zero.png":
                imgs.append(
                    mmcv.imfrombytes(
                        get(name, backend_args=self.backend_args),
                        flag=self.color_type,
                        backend="pillow",
                        channel_order="rgb",
                    )
                )
            else:
                imgs.append(np.zeros((1, 1, 3)))

Sep 26 '23 08:09 san9569

In here, you have to put the dummy array.

Yes, I changed it, here is my code, can you help me see what is wrong? class BEVLoadKittiImageFromFiles(LoadMultiViewImageFromFiles): """Load multi channel images from a list of separate channel files.

``BEVLoadMultiViewImageFromFiles`` adds the following keys for the
convenience of view transforms in the forward:
    - 'cam2lidar'
    - 'lidar2img'

Args:
    to_float32 (bool): Whether to convert the img to float32.
        Defaults to False.
    color_type (str): Color type of the file. Defaults to 'unchanged'.
    backend_args (dict, optional): Arguments to instantiate the
        corresponding backend. Defaults to None.
    num_views (int): Number of view in a frame. Defaults to 5.
    num_ref_frames (int): Number of frame in loading. Defaults to -1.
    test_mode (bool): Whether is test mode in loading. Defaults to False.
    set_default_scale (bool): Whether to set default scale.
        Defaults to True.
"""

def transform(self, results: dict) -> Optional[dict]:
    """Call function to load multi-view image from files.

Args:
    results (dict): Result dict containing multi-view image filenames.

Returns:
    dict: The result dict containing the multi-view image data.
    Added keys and values are described below.

        - filename (str): Multi-view image filenames.
        - img (np.ndarray): Multi-view image arrays.
        - img_shape (tuple[int]): Shape of multi-view image arrays.
        - ori_shape (tuple[int]): Shape of original image arrays.
        - pad_shape (tuple[int]): Shape of padded image arrays.
        - scale_factor (float): Scale factor.
        - img_norm_cfg (dict): Normalization configuration of images.
    """
    # TODO: consider split the multi-sweep part out of this pipeline
    # Derive the mask and transform for loading of multi-sweep data
    # Support multi-view images with different shapes
    # TODO: record the origin shape and padded shape
    filename, cam2img, lidar2cam, cam2lidar, lidar2img = [], [], [], [], []
    filename.append(results['images']['CAM2']['img_path'])
    lidar2cam.append(results['images']['CAM2']['lidar2cam'])
    lidar2cam_array = np.array(results['images']['CAM2']['lidar2cam']).astype(np.float32)
    lidar2cam_rot = lidar2cam_array[:3, :3]
    lidar2cam_trans = lidar2cam_array[:3, 3:4]
    camera2lidar = np.eye(4)
    camera2lidar[:3, :3] = lidar2cam_rot.T
    camera2lidar[:3, 3:4] = -1 * np.matmul(
        lidar2cam_rot.T, lidar2cam_trans.reshape(3, 1))
    cam2lidar.append(camera2lidar)
    cam2img_array = np.eye(4).astype(np.float32)
    cam2img_array[:3, :3] = np.array(results['images']['CAM2']['cam2img'])[:3, :3].astype(np.float32)
    cam2img.append(cam2img_array)
    lidar2img.append(cam2img_array @ lidar2cam_array)

    results['img_path'] = filename
    results['cam2img'] = np.stack(cam2img, axis=0)
    results['lidar2cam'] = np.stack(lidar2cam, axis=0)
    results['cam2lidar'] = np.stack(cam2lidar, axis=0)
    results['lidar2img'] = np.stack(lidar2img, axis=0)

    results['ori_cam2img'] = copy.deepcopy(results['cam2img'])

# img is of shape (h, w, c, num_views)
# h and w can be different for different views
    img_bytes = [
        get(name, backend_args=self.backend_args) for name in filename]
    imgs = [
        mmcv.imfrombytes(
            img_byte,
            flag=self.color_type,
            backend='pillow',
            channel_order='rgb') for img_byte in img_bytes
    ]
    # handle the image with different shape
    img_shapes = np.stack([img.shape for img in imgs], axis=0)
    img_shape_max = np.max(img_shapes, axis=0)
    img_shape_min = np.min(img_shapes, axis=0)
    assert img_shape_min[-1] == img_shape_max[-1]
    if not np.all(img_shape_max == img_shape_min):
        pad_shape = img_shape_max[:2]
    else:
        pad_shape = None
    if pad_shape is not None:
        imgs = [
            mmcv.impad(img, shape=pad_shape, pad_val=0) for img in imgs
        ]
    img = np.stack(imgs, axis=-1)
    if self.to_float32:
        img = img.astype(np.float32)

    results['filename'] = filename
    # unravel to list, see `DefaultFormatBundle` in formating.py
    # which will transpose each image separately and then stack into array
    results['img'] = [img[..., i] for i in range(img.shape[-1])]
    results['img_shape'] = img.shape[:2]
    results['ori_shape'] = img.shape[:2]
    # Set initial values for default meta_keys
    results['pad_shape'] = img.shape[:2]
    if self.set_default_scale:
        results['scale_factor'] = 1.0
    num_channels = 1 if len(img.shape) < 3 else img.shape[2]
    results['img_norm_cfg'] = dict(
        mean=np.zeros(num_channels, dtype=np.float32),
        std=np.ones(num_channels, dtype=np.float32),
        to_rgb=False)
    results['num_views'] = self.num_views
    results['num_ref_frames'] = self.num_ref_frames
    return results

Sep 26 '23 08:09 Lcl159

@Lcl159, Have you modified the in here?LoadMultiViewImageFromFiles

In here, you have to put the dummy array.

For example, my modification in mmdetection3d codebase is as follows.

        filename, cam2img, lidar2cam, cam2lidar, lidar2img = [], [], [], [], []
        for _, cam_item in results["images"].items():
            if isinstance(cam_item, dict):
                if "img_path" in cam_item.keys():
                    filename.append(cam_item["img_path"])
                    lidar2cam.append(cam_item["lidar2cam"])

                    lidar2cam_array = np.array(cam_item["lidar2cam"]).astype(
                        np.float32
                    )
                    lidar2cam_rot = lidar2cam_array[:3, :3]
                    lidar2cam_trans = lidar2cam_array[:3, 3:4]
                    camera2lidar = np.eye(4)
                    camera2lidar[:3, :3] = lidar2cam_rot.T
                    camera2lidar[:3, 3:4] = -1 * np.matmul(
                        lidar2cam_rot.T, lidar2cam_trans.reshape(3, 1)
                    )
                    cam2lidar.append(camera2lidar)

                    cam2img_array = np.eye(4).astype(np.float32)
                    cam2img_array[:3, :3] = np.array(
                        cam_item["cam2img"]
                    ).astype(np.float32)[:3, :3]
                    cam2img.append(cam2img_array)
                    lidar2img.append(cam2img_array @ lidar2cam_array)
                else:
                    filename.append("zero.png")
                    lidar2cam.append(np.eye(4))
                    cam2lidar.append(np.eye(4))
                    cam2img.append(np.eye(4))
                    lidar2img.append(np.eye(4))
            else:
                filename.append("zero.png")
                lidar2cam.append(np.eye(4))
                cam2lidar.append(np.eye(4))
                cam2img.append(np.eye(4))
                lidar2img.append(np.eye(4))

        results["img_path"] = filename
        results["cam2img"] = np.stack(cam2img, axis=0)
        results["lidar2cam"] = np.stack(lidar2cam, axis=0)
        results["cam2lidar"] = np.stack(cam2lidar, axis=0)
        results["lidar2img"] = np.stack(lidar2img, axis=0)

        results["ori_cam2img"] = copy.deepcopy(results["cam2img"])

        # img is of shape (h, w, c, num_views)
        # h and w can be different for different views
        imgs = []
        for name in filename:
            if not name == "zero.png":
                imgs.append(
                    mmcv.imfrombytes(
                        get(name, backend_args=self.backend_args),
                        flag=self.color_type,
                        backend="pillow",
                        channel_order="rgb",
                    )
                )
            else:
                imgs.append(np.zeros((1, 1, 3)))

Or can you give me your contact information?

Sep 26 '23 08:09 Lcl159

List such as cam2img seems to have only 1 element. Please check the shape of input image tensor and corresponding calibration matrices.

Sep 26 '23 08:09 san9569

In here, you have to put the dummy array.

For example, my modification in mmdetection3d codebase is as follows.

I made the changes according to what you said, but I still get that error, can you tell me the details of the changes on mmdetection3d, I really can't change it, thank you very much!

Sep 26 '23 13:09 Lcl159

@Lcl159 Have you checked the shapes of input image, calibration params, and img aug matrix?

Sep 26 '23 14:09 san9569

@Lcl159 Have you checked the shapes of input image, calibration params, and img aug matrix?

I printed the shape of the parameter, can you tell me where I should change it?

Sep 27 '23 01:09 Lcl159

I cannot check the data in my code because I'm not in lab now.

I remember img aug matrix have shape of (B, N, 4, 4). However, your mat have (B, 4, 4). Then, in the code you provided, cur_img_aug_matrix will be (4, 4), not (N, 4, 4).
Why is not the shape of image (B, N, 3, H, W)? The channel is 256. It's weird.

Sep 27 '23 07:09 san9569

img aug matrix

I cannot check the data in my code because I'm not in lab now.

I remember img aug matrix have shape of (B, N, 4, 4). However, your mat have (B, 4, 4). Then, in the code you provided, cur_img_aug_matrix will be (4, 4), not (N, 4, 4).

Why is not the shape of image (B, N, 3, H, W)? The channel is 256. It's weird.

I'm confused too, it's been bugging me for a long time, I can't solve the problem and I've never been able to find out what the problem is

Sep 27 '23 07:09 Lcl159

I cannot check the data in my code because I'm not in lab now.

I remember img aug matrix have shape of (B, N, 4, 4). However, your mat have (B, 4, 4). Then, in the code you provided, cur_img_aug_matrix will be (4, 4), not (N, 4, 4).

Why is not the shape of image (B, N, 3, H, W)? The channel is 256. It's weird.

view_transform=dict( type='DepthLSSTransform', in_channels=256, out_channels=80, image_size=[256, 704], feature_size=[32, 88], xbound=[-54.0, 54.0, 0.3], ybound=[-54.0, 54.0, 0.3], zbound=[-10.0, 10.0, 20.0], dbound=[1.0, 60.0, 0.5], downsample=2), Maybe it's because I didn't change it here?

Sep 27 '23 11:09 Lcl159

have you apply kitti to bevfusion successfully?

Oct 26 '23 05:10 WuYanXingege

have you apply kitti to bevfusion successfully? 你好，可以向您学习有关如何在KITTI上训练的相关事宜吗，希望可以加个好友联系v13731082126

Nov 21 '23 08:11 liuyuansheng624

你好，我现在也没搞出来

向阳 @.***

------------------ 原始邮件 ------------------ 发件人: "mit-han-lab/bevfusion" @.>; 发送时间: 2023年11月21日(星期二) 下午4:50 @.>; @.@.>; 主题: Re: [mit-han-lab/bevfusion] When using single image (e.g. KITTI) (Issue #501)

have you apply kitti to bevfusion successfully? 你好，可以向您学习有关如何在KITTI上训练的相关事宜吗，希望可以加个好友联系v13731082126

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Nov 22 '23 08:11 WuYanXingege

你好，我现在也没搞出来向阳 @.*** … ------------------ 原始邮件 ------------------ 发件人: "mit-han-lab/bevfusion" @.>; 发送时间: 2023年11月21日(星期二) 下午4:50 @.>; @.@.>; 主题: Re: [mit-han-lab/bevfusion] When using single image (e.g. KITTI) (Issue #501) have you apply kitti to bevfusion successfully? 你好，可以向您学习有关如何在KITTI上训练的相关事宜吗，希望可以加个好友联系v13731082126 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

ok，I have already finished it .

Nov 23 '23 12:11 liuyuansheng624

你好，我现在也没搞出来向阳_@_ .*** … ------------------ 原始邮件 -------------- ---- 发件人：“mit-han-lab/bevfusion” @ . >; 发送时间：2023年11月21日（周二）下午4:50 _@** . _>; _@。@。>; 主题: 回复：[mit-han-lab/bevfusion] 使用单图像（例如 KITTI）时（问题#501）您是否成功将 kitti 应用于 bevfusion？你好，可以向您学习有关如何在 KITTI 上训练的相关传染病吗，希望可以加个好友联系 v13731082126 — 直接回复此电子邮件，在 GitHub 上查看，或取消订阅。您收到此消息是因为您发表了评论。消息 ID：@_ .*>

好的，我已经写完了。

能分享一下代码吗，我这里用单摄像头训练后导出的onnx图像输入还是1,6,3,256,704的size

Nov 27 '23 07:11 wangjunquan911

bevfusion bevfusion copied to clipboard

When using single image (e.g. KITTI)

bevfusion
bevfusion copied to clipboard