TransFusion
TransFusion copied to clipboard
reimplementation problems on waymo
hello, thanks for your excellent work! But I have a problem with the reproduction of the waymo open dataset: I can get results of Transfusion-L: 'Overall/L1 mAP': 0.734978, 'Overall/L1 mAPH': 0.70693, 'Overall/L2 mAP': 0.671998, 'Overall/L2 mAPH': 0.645886 But the results of Transfusion-LC get worse: 'Overall/L1 mAP': 0.726501, 'Overall/L1 mAPH': 0.698618, 'Overall/L2 mAP': 0.663435, 'Overall/L2 mAPH': 0.637546
Hi, sorry for the late reply. Did you first pre-train the 2D backbone on Waymo? Since we did not find any off-the-shelf 2D backbones pretrained on the waymo dataset, we followed the MaskRCNN config without the maskhead to train a Resnet50+FPN backbone on waymo as the 2D feature extractor. Then we use the following code to combine the pretrained 2D backbone and TransFusion-L as the load_from
key of TransFusion.
img = torch.load('img_backbone.pth', map_location='cpu')
pts = torch.load('transfusionL.pth', map_location='cpu')
new_model = {"state_dict": pts["state_dict"]}
for k,v in img["state_dict"].items():
if 'backbone' in k or 'neck' in k:
new_model["state_dict"]['img_'+k] = v
print('img_'+k)
torch.save(new_model, "fusion_model.pth")
Yes, I have pre-trained a 2D backbone on Waymo first. The config and log as follows: waymo-2d-log.txt. And I fixed the backbone of image and lidar for training. I don't know where my problem is. Can you provide your 2d waymo model?
Sorry I am not able to provide the model checkpoints. Your config looks good to me. One thing I forget to mention is that I actually change the data-preprocessing of waymo by changing tools/data_converter/waymo_converter.py L267 from from labels in frame.projected_lidar_labels
to from labels in frame.camera_labels
. The reason is that the projected_lidar_labels usually do not tightly fit the image boxes and contain objects that are totally occluded in the image space. And to verify whether your 2D backbone is well trained or not, you can perform some visualization on waymo 2D detection.
Thanks for the kind reply, but if I directly changed for labels in for labels in frame.projected_lidar_labels to for labels in frame.camera_labels , it will not work. Maybe the same problem as https://github.com/waymo-research/waymo-open-dataset/issues/141. And I train directly with projected_lidar_labels, it shouldn't degenerate the model either.
@Trent-tangtao , hi, I also encounter the same issue. Have you managed to obtain a more reasonable result with Transfusion-LC on Waymo?
hello, i have a question about the experiment in waymo dataset. The lidar is 360°-filed range but the camera is around 120°, so what do you do with the fields of view where the data doesn't overlap?
Hi, could you please share your lidar-only log?