bevfusion
bevfusion copied to clipboard
When using single image (e.g. KITTI)
Hi, thank you for your great work!
I'm trying to learn BEVFusion on a KITT dataset.
Since using a single image, I have used 5 images as the zero array. Accordingly, I have set the corresponding calibration matrix to the identity matrix.
This setup worked successfully in BEVFusion with the codebase version of MMDetection3D.
However, after modifying the code based on this repository, I observed a drop in the fusion model's performance relative to the LiDAR model's performance.
To date, I have not been able to identify a clear reason.
In my opinion, the difference between my current code and the implementation in MMDetection3D is the order of the images. (Of course, there could be other differences).
In MMDetection3D, I put the single image in the third (because I was using CAM2), whereas here I put it in the first.
Can this have a significant impact on the results? Or can you speculate on another cause?
When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance, have you found the reason??
Hello, I am also trying to learn BEVFusion on KITTI dataset but I keep getting errors, can you provide me some help, I will be very thankful!
When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too, have you found the reason??
No, I'm using model from mmdetection3d repo.
Hello, I am also trying to learn BEVFusion on KITTI dataset but I keep getting errors, can you provide me some help, I will be very thankful!
Hi, to train BEVFusion on KITTI requires some bunches of modification of the codes and it's not trivial. You have to consider some factors (such as number of input images, number of outputs except for velocity, etc.).
If you show me the error, I could give you some advice for you.
When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too, have you found the reason??
No, I'm using model from mmdetection3d repo.
so after using model from mmdet3d repo, the fusion model's performance improved????
When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too, have you found the reason??
No, I'm using model from mmdetection3d repo.
so after using model from mmdet3d repo, the fusion model's performance improved????
Yep, I could get a reasonable performance improvement.
When I ues bevfusion in our own dataset(only one camera too), I get a drop in the fusion model's performance too, have you found the reason??
No, I'm using model from mmdetection3d repo.
so after using model from mmdet3d repo, the fusion model's performance improved????
Yep, I could get a reasonable performance improvement.
Nice! And I have another question, the image size of kitti datasets(1392 x 512) is different from nuscenes(1280 x 720),how to set the resize parameter in data augmentation???
Hello, I am also trying to learn BEVFusion on KITTI dataset but I keep getting errors, can you provide me some help, I will be very thankful!
Hi, to train BEVFusion on KITTI requires some bunches of modification of the codes and it's not trivial. You have to consider some factors (such as number of input images, number of outputs except for velocity, etc.).
If you show me the error, I could give you some advice for you.
Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2
Nice! And I have another question, the image size of kitti datasets(1392 x 512) is different from nuscenes(1280 x 720),how to set the resize parameter in data augmentation???
First, please notice that the configuration file could be different from here because I use the mmdetection3d repo.
In my case, all input image is resized to (256, 704) which is input size of 2D model.
Nice! And I have another question, the image size of kitti datasets(1392 x 512) is different from nuscenes(1280 x 720),how to set the resize parameter in data augmentation???
First, please notice that the configuration file could be different from here because I use the mmdetection3d repo.
In my case, all input image is resized to (256, 704) which is input size of 2D model.
OK...... Thanks so much~
Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2
In my memory, cur_img_aug_matrix
should have the shape of (N, 4, 4) where N is the number of input images.
With KITTI dataset, I put the dummy zero array to it except for the real one because I used a single image.
Please check the shape of cur_img_aug_matrix
.
Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2
In my memory, should have the shape of (N, 4, 4) where N is the number of input images. With KITTI dataset, I put the dummy zero array to it except for the real one because I used a single image. Please check the shape of .
cur_img_aug_matrix``cur_img_aug_matrix
Yes, but I haven't been able to find it, if it's convenient, could you provide me with the relevant code, I'd appreciate it!
Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2
In my memory,
cur_img_aug_matrix
should have the shape of (N, 4, 4) where N is the number of input images. With KITTI dataset, I put the dummy zero array to it except for the real one because I used a single image. Please check the shape ofcur_img_aug_matrix
.
Thanks so much. When I train, the following error occurs: File "mmdetection3d/projects/BEVFusion/bevfusion/depth_lss.py", line 299, in forward cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords) IndexError: too many indices for tensor of dimension 2
In my memory,
cur_img_aug_matrix
should have the shape of (N, 4, 4) where N is the number of input images. With KITTI dataset, I put the dummy zero array to it except for the real one because I used a single image. Please check the shape ofcur_img_aug_matrix
.
I will check it, thank you very much~
@Lcl159, oops, sorry I was confused about it. I do not modify the img_aug_matrix.
If the input images whose shape is (B, N, C, H, W) are processed, the augmentation matrix is assigned in here.
So, if you correctly set the input images, cur_img_aug_matrix
should be loaded.
Please print the shape of it.
@Lcl159, oops, sorry I was confused about it. I do not modify the img_aug_matrix.
If the input images whose shape is (B, N, C, H, W) are processed, the augmentation matrix is assigned in here.
So, if you correctly set the input images,
cur_img_aug_matrix
should be loaded. Please print the shape of it.
I'm thinking that I'm modifying the config file, or the loading file, but I can't fix it, can you tell me what you've changed?
@Lcl159, Have you modified the LoadMultiViewImageFromFiles
in here?
In here, you have to put the dummy array.
For example, my modification in mmdetection3d codebase is as follows.
filename, cam2img, lidar2cam, cam2lidar, lidar2img = [], [], [], [], []
for _, cam_item in results["images"].items():
if isinstance(cam_item, dict):
if "img_path" in cam_item.keys():
filename.append(cam_item["img_path"])
lidar2cam.append(cam_item["lidar2cam"])
lidar2cam_array = np.array(cam_item["lidar2cam"]).astype(
np.float32
)
lidar2cam_rot = lidar2cam_array[:3, :3]
lidar2cam_trans = lidar2cam_array[:3, 3:4]
camera2lidar = np.eye(4)
camera2lidar[:3, :3] = lidar2cam_rot.T
camera2lidar[:3, 3:4] = -1 * np.matmul(
lidar2cam_rot.T, lidar2cam_trans.reshape(3, 1)
)
cam2lidar.append(camera2lidar)
cam2img_array = np.eye(4).astype(np.float32)
cam2img_array[:3, :3] = np.array(
cam_item["cam2img"]
).astype(np.float32)[:3, :3]
cam2img.append(cam2img_array)
lidar2img.append(cam2img_array @ lidar2cam_array)
else:
filename.append("zero.png")
lidar2cam.append(np.eye(4))
cam2lidar.append(np.eye(4))
cam2img.append(np.eye(4))
lidar2img.append(np.eye(4))
else:
filename.append("zero.png")
lidar2cam.append(np.eye(4))
cam2lidar.append(np.eye(4))
cam2img.append(np.eye(4))
lidar2img.append(np.eye(4))
results["img_path"] = filename
results["cam2img"] = np.stack(cam2img, axis=0)
results["lidar2cam"] = np.stack(lidar2cam, axis=0)
results["cam2lidar"] = np.stack(cam2lidar, axis=0)
results["lidar2img"] = np.stack(lidar2img, axis=0)
results["ori_cam2img"] = copy.deepcopy(results["cam2img"])
# img is of shape (h, w, c, num_views)
# h and w can be different for different views
imgs = []
for name in filename:
if not name == "zero.png":
imgs.append(
mmcv.imfrombytes(
get(name, backend_args=self.backend_args),
flag=self.color_type,
backend="pillow",
channel_order="rgb",
)
)
else:
imgs.append(np.zeros((1, 1, 3)))
In here, you have to put the dummy array.
Yes, I changed it, here is my code, can you help me see what is wrong? class BEVLoadKittiImageFromFiles(LoadMultiViewImageFromFiles): """Load multi channel images from a list of separate channel files.
``BEVLoadMultiViewImageFromFiles`` adds the following keys for the
convenience of view transforms in the forward:
- 'cam2lidar'
- 'lidar2img'
Args:
to_float32 (bool): Whether to convert the img to float32.
Defaults to False.
color_type (str): Color type of the file. Defaults to 'unchanged'.
backend_args (dict, optional): Arguments to instantiate the
corresponding backend. Defaults to None.
num_views (int): Number of view in a frame. Defaults to 5.
num_ref_frames (int): Number of frame in loading. Defaults to -1.
test_mode (bool): Whether is test mode in loading. Defaults to False.
set_default_scale (bool): Whether to set default scale.
Defaults to True.
"""
def transform(self, results: dict) -> Optional[dict]:
"""Call function to load multi-view image from files.
Args:
results (dict): Result dict containing multi-view image filenames.
Returns:
dict: The result dict containing the multi-view image data.
Added keys and values are described below.
- filename (str): Multi-view image filenames.
- img (np.ndarray): Multi-view image arrays.
- img_shape (tuple[int]): Shape of multi-view image arrays.
- ori_shape (tuple[int]): Shape of original image arrays.
- pad_shape (tuple[int]): Shape of padded image arrays.
- scale_factor (float): Scale factor.
- img_norm_cfg (dict): Normalization configuration of images.
"""
# TODO: consider split the multi-sweep part out of this pipeline
# Derive the mask and transform for loading of multi-sweep data
# Support multi-view images with different shapes
# TODO: record the origin shape and padded shape
filename, cam2img, lidar2cam, cam2lidar, lidar2img = [], [], [], [], []
filename.append(results['images']['CAM2']['img_path'])
lidar2cam.append(results['images']['CAM2']['lidar2cam'])
lidar2cam_array = np.array(results['images']['CAM2']['lidar2cam']).astype(np.float32)
lidar2cam_rot = lidar2cam_array[:3, :3]
lidar2cam_trans = lidar2cam_array[:3, 3:4]
camera2lidar = np.eye(4)
camera2lidar[:3, :3] = lidar2cam_rot.T
camera2lidar[:3, 3:4] = -1 * np.matmul(
lidar2cam_rot.T, lidar2cam_trans.reshape(3, 1))
cam2lidar.append(camera2lidar)
cam2img_array = np.eye(4).astype(np.float32)
cam2img_array[:3, :3] = np.array(results['images']['CAM2']['cam2img'])[:3, :3].astype(np.float32)
cam2img.append(cam2img_array)
lidar2img.append(cam2img_array @ lidar2cam_array)
results['img_path'] = filename
results['cam2img'] = np.stack(cam2img, axis=0)
results['lidar2cam'] = np.stack(lidar2cam, axis=0)
results['cam2lidar'] = np.stack(cam2lidar, axis=0)
results['lidar2img'] = np.stack(lidar2img, axis=0)
results['ori_cam2img'] = copy.deepcopy(results['cam2img'])
# img is of shape (h, w, c, num_views)
# h and w can be different for different views
img_bytes = [
get(name, backend_args=self.backend_args) for name in filename]
imgs = [
mmcv.imfrombytes(
img_byte,
flag=self.color_type,
backend='pillow',
channel_order='rgb') for img_byte in img_bytes
]
# handle the image with different shape
img_shapes = np.stack([img.shape for img in imgs], axis=0)
img_shape_max = np.max(img_shapes, axis=0)
img_shape_min = np.min(img_shapes, axis=0)
assert img_shape_min[-1] == img_shape_max[-1]
if not np.all(img_shape_max == img_shape_min):
pad_shape = img_shape_max[:2]
else:
pad_shape = None
if pad_shape is not None:
imgs = [
mmcv.impad(img, shape=pad_shape, pad_val=0) for img in imgs
]
img = np.stack(imgs, axis=-1)
if self.to_float32:
img = img.astype(np.float32)
results['filename'] = filename
# unravel to list, see `DefaultFormatBundle` in formating.py
# which will transpose each image separately and then stack into array
results['img'] = [img[..., i] for i in range(img.shape[-1])]
results['img_shape'] = img.shape[:2]
results['ori_shape'] = img.shape[:2]
# Set initial values for default meta_keys
results['pad_shape'] = img.shape[:2]
if self.set_default_scale:
results['scale_factor'] = 1.0
num_channels = 1 if len(img.shape) < 3 else img.shape[2]
results['img_norm_cfg'] = dict(
mean=np.zeros(num_channels, dtype=np.float32),
std=np.ones(num_channels, dtype=np.float32),
to_rgb=False)
results['num_views'] = self.num_views
results['num_ref_frames'] = self.num_ref_frames
return results
@Lcl159, Have you modified the in here?
LoadMultiViewImageFromFiles
In here, you have to put the dummy array.
For example, my modification in mmdetection3d codebase is as follows.
filename, cam2img, lidar2cam, cam2lidar, lidar2img = [], [], [], [], [] for _, cam_item in results["images"].items(): if isinstance(cam_item, dict): if "img_path" in cam_item.keys(): filename.append(cam_item["img_path"]) lidar2cam.append(cam_item["lidar2cam"]) lidar2cam_array = np.array(cam_item["lidar2cam"]).astype( np.float32 ) lidar2cam_rot = lidar2cam_array[:3, :3] lidar2cam_trans = lidar2cam_array[:3, 3:4] camera2lidar = np.eye(4) camera2lidar[:3, :3] = lidar2cam_rot.T camera2lidar[:3, 3:4] = -1 * np.matmul( lidar2cam_rot.T, lidar2cam_trans.reshape(3, 1) ) cam2lidar.append(camera2lidar) cam2img_array = np.eye(4).astype(np.float32) cam2img_array[:3, :3] = np.array( cam_item["cam2img"] ).astype(np.float32)[:3, :3] cam2img.append(cam2img_array) lidar2img.append(cam2img_array @ lidar2cam_array) else: filename.append("zero.png") lidar2cam.append(np.eye(4)) cam2lidar.append(np.eye(4)) cam2img.append(np.eye(4)) lidar2img.append(np.eye(4)) else: filename.append("zero.png") lidar2cam.append(np.eye(4)) cam2lidar.append(np.eye(4)) cam2img.append(np.eye(4)) lidar2img.append(np.eye(4)) results["img_path"] = filename results["cam2img"] = np.stack(cam2img, axis=0) results["lidar2cam"] = np.stack(lidar2cam, axis=0) results["cam2lidar"] = np.stack(cam2lidar, axis=0) results["lidar2img"] = np.stack(lidar2img, axis=0) results["ori_cam2img"] = copy.deepcopy(results["cam2img"]) # img is of shape (h, w, c, num_views) # h and w can be different for different views imgs = [] for name in filename: if not name == "zero.png": imgs.append( mmcv.imfrombytes( get(name, backend_args=self.backend_args), flag=self.color_type, backend="pillow", channel_order="rgb", ) ) else: imgs.append(np.zeros((1, 1, 3)))
Or can you give me your contact information?
List such as cam2img
seems to have only 1 element.
Please check the shape of input image tensor and corresponding calibration matrices.
In here, you have to put the dummy array.
For example, my modification in mmdetection3d codebase is as follows.
I made the changes according to what you said, but I still get that error, can you tell me the details of the changes on mmdetection3d, I really can't change it, thank you very much!
@Lcl159 Have you checked the shapes of input image, calibration params, and img aug matrix?
@Lcl159 Have you checked the shapes of input image, calibration params, and img aug matrix?
I printed the shape of the parameter, can you tell me where I should change it?
I cannot check the data in my code because I'm not in lab now.
-
I remember img aug matrix have shape of (B, N, 4, 4). However, your mat have (B, 4, 4). Then, in the code you provided, cur_img_aug_matrix will be (4, 4), not (N, 4, 4).
-
Why is not the shape of image (B, N, 3, H, W)? The channel is 256. It's weird.
- img aug matrix
I cannot check the data in my code because I'm not in lab now.
- I remember img aug matrix have shape of (B, N, 4, 4). However, your mat have (B, 4, 4). Then, in the code you provided, cur_img_aug_matrix will be (4, 4), not (N, 4, 4).
- Why is not the shape of image (B, N, 3, H, W)? The channel is 256. It's weird.
I'm confused too, it's been bugging me for a long time, I can't solve the problem and I've never been able to find out what the problem is
I cannot check the data in my code because I'm not in lab now.
- I remember img aug matrix have shape of (B, N, 4, 4). However, your mat have (B, 4, 4). Then, in the code you provided, cur_img_aug_matrix will be (4, 4), not (N, 4, 4).
- Why is not the shape of image (B, N, 3, H, W)? The channel is 256. It's weird.
view_transform=dict( type='DepthLSSTransform', in_channels=256, out_channels=80, image_size=[256, 704], feature_size=[32, 88], xbound=[-54.0, 54.0, 0.3], ybound=[-54.0, 54.0, 0.3], zbound=[-10.0, 10.0, 20.0], dbound=[1.0, 60.0, 0.5], downsample=2), Maybe it's because I didn't change it here?
have you apply kitti to bevfusion successfully?
have you apply kitti to bevfusion successfully? 你好,可以向您学习有关如何在KITTI上训练的相关事宜吗,希望可以加个好友联系v13731082126
你好,我现在也没搞出来
向阳 @.***
------------------ 原始邮件 ------------------ 发件人: "mit-han-lab/bevfusion" @.>; 发送时间: 2023年11月21日(星期二) 下午4:50 @.>; @.@.>; 主题: Re: [mit-han-lab/bevfusion] When using single image (e.g. KITTI) (Issue #501)
have you apply kitti to bevfusion successfully? 你好,可以向您学习有关如何在KITTI上训练的相关事宜吗,希望可以加个好友联系v13731082126
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
你好,我现在也没搞出来 向阳 @.*** … ------------------ 原始邮件 ------------------ 发件人: "mit-han-lab/bevfusion" @.>; 发送时间: 2023年11月21日(星期二) 下午4:50 @.>; @.@.>; 主题: Re: [mit-han-lab/bevfusion] When using single image (e.g. KITTI) (Issue #501) have you apply kitti to bevfusion successfully? 你好,可以向您学习有关如何在KITTI上训练的相关事宜吗,希望可以加个好友联系v13731082126 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
ok,I have already finished it .
你好,我现在也没搞出来 向阳_@_ .*** … ------------------ 原始邮件 -------------- ---- 发件人:“mit-han-lab/bevfusion” @ . >; 发送时间:2023年11月21日(周二)下午4:50 _@** . _>; _@。@。>; 主题: 回复:[mit-han-lab/bevfusion] 使用单图像(例如 KITTI)时(问题#501)您是否成功将 kitti 应用于 bevfusion?你好,可以向您学习有关如何在 KITTI 上训练的相关传染病吗,希望可以加个好友联系 v13731082126 — 直接回复此电子邮件,在 GitHub 上查看,或取消订阅。您收到此消息是因为您发表了评论。消息 ID:@_ .*>
好的,我已经写完了。
能分享一下代码吗,我这里用单摄像头训练后导出的onnx图像输入还是1,6,3,256,704的size