TransFusion icon indicating copy to clipboard operation
TransFusion copied to clipboard

reimplementation problems on waymo

Open tangtaogo opened this issue 2 years ago • 7 comments

hello, thanks for your excellent work! But I have a problem with the reproduction of the waymo open dataset: I can get results of Transfusion-L: 'Overall/L1 mAP': 0.734978, 'Overall/L1 mAPH': 0.70693, 'Overall/L2 mAP': 0.671998, 'Overall/L2 mAPH': 0.645886 But the results of Transfusion-LC get worse: 'Overall/L1 mAP': 0.726501, 'Overall/L1 mAPH': 0.698618, 'Overall/L2 mAP': 0.663435, 'Overall/L2 mAPH': 0.637546

tangtaogo avatar May 06 '22 09:05 tangtaogo

Hi, sorry for the late reply. Did you first pre-train the 2D backbone on Waymo? Since we did not find any off-the-shelf 2D backbones pretrained on the waymo dataset, we followed the MaskRCNN config without the maskhead to train a Resnet50+FPN backbone on waymo as the 2D feature extractor. Then we use the following code to combine the pretrained 2D backbone and TransFusion-L as the load_from key of TransFusion.

img = torch.load('img_backbone.pth', map_location='cpu')
pts = torch.load('transfusionL.pth', map_location='cpu')
new_model = {"state_dict": pts["state_dict"]}
for k,v in img["state_dict"].items():
    if 'backbone' in k or 'neck' in k:
        new_model["state_dict"]['img_'+k] = v
        print('img_'+k)
torch.save(new_model, "fusion_model.pth")

XuyangBai avatar May 11 '22 09:05 XuyangBai

Yes, I have pre-trained a 2D backbone on Waymo first. The config and log as follows: waymo-2d-log.txt. And I fixed the backbone of image and lidar for training. I don't know where my problem is. Can you provide your 2d waymo model?

tangtaogo avatar May 12 '22 12:05 tangtaogo

Sorry I am not able to provide the model checkpoints. Your config looks good to me. One thing I forget to mention is that I actually change the data-preprocessing of waymo by changing tools/data_converter/waymo_converter.py L267 from from labels in frame.projected_lidar_labels to from labels in frame.camera_labels. The reason is that the projected_lidar_labels usually do not tightly fit the image boxes and contain objects that are totally occluded in the image space. And to verify whether your 2D backbone is well trained or not, you can perform some visualization on waymo 2D detection.

XuyangBai avatar May 12 '22 13:05 XuyangBai

Thanks for the kind reply, but if I directly changed for labels in for labels in frame.projected_lidar_labels to for labels in frame.camera_labels , it will not work. Maybe the same problem as https://github.com/waymo-research/waymo-open-dataset/issues/141. And I train directly with projected_lidar_labels, it shouldn't degenerate the model either.

tangtaogo avatar May 13 '22 04:05 tangtaogo

@Trent-tangtao , hi, I also encounter the same issue. Have you managed to obtain a more reasonable result with Transfusion-LC on Waymo?

yinjunbo avatar Sep 06 '22 03:09 yinjunbo

hello, i have a question about the experiment in waymo dataset. The lidar is 360°-filed range but the camera is around 120°, so what do you do with the fields of view where the data doesn't overlap?

Liaoqing-up avatar Nov 23 '22 11:11 Liaoqing-up

Hi, could you please share your lidar-only log?

Gaoeee avatar Apr 24 '24 08:04 Gaoeee