InterFuser
InterFuser copied to clipboard
More details regarding training
Great work!
From the paper "The backbone for encoding information from multi-view RGB images is Resnet-50 pretrained on ImageNet [56], and the backbone for processing LiDAR BEV representations is ResNet-18 trained from scratch."
Could you please provide more details regarding training these backbones? How long did you train? How large is the dataset? Did you use the same 8 V100 GPU? Which dataset are you using to train ResNet-18? After pre-training, did you do any further fine-tuning for these two backbone or you just used the pretrained directly?
Thank you very much! :)
Hi!
- We train the model on 8A100 with 3~5 days (it usually depends on the io speed).
- The dataset we used is about 1-2T
- We do not use other dataset to train ResNet-18. For ResNet-50, we just use the weight from the ImageNet pretrain.
- In our our work, we only train the model with only one stage.
- For other details, please refer to our paper https://arxiv.org/abs/2207.14024
Hi!
I notice that about pretrain weights you mention "We also provide examplery model weights " and "Note: The model is trained on the part of the full dataset with several towns and weathers.". Does it means that the pretrain weights you provide are not the same as the paper and are just used as an example? If so does it mean that if we would like to reproduce the results from the paper, we have to generate the full dataset firstly and train the model ourselves?
Thank you very much! :)
Yes, the weights is trianed on a subset of the FULL dataset. And I think generating the full dataset is a prerequisite for reproducing the results of the paper.
Thank you very much! Do you have a plan to release the pretrain weights from your paper? :)
I have similar questions, are the pretrained weights available somewhere?
@Mariusmarten @zhenggruk Hi, The pretrain weights we used is from PyTorch-Vision repo.