3DTrans
3DTrans copied to clipboard
An open-source codebase for exploring autonomous driving pre-training
3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task
3DTrans
includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.
- We implement the Transfer Learning Techniques consisting of four functions:
- Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
- Active Domain Adaptation (ADA) for 3D Point Clouds
- Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
- Multi-dateset Domain Fusion (MDF) for 3D Point Clouds
- We implement the Scalable Pre-training which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:
- AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset
- SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
Overview
- News
- Installation for 3DTrans
- Getting Started
-
Transfer Learning Techniques@3DTrans
-
Model Zoo:
- Domain Transfer Results
-
Model Zoo:
-
Scalable Pre-training Techniques@3DTrans
-
Model Zoo:
- AD-PT Results
- ReSimAD
-
Model Zoo:
- Visualization Tools for 3DTrans
- 3DTrans Framework Introduction
- Acknowledge
- Citation
News :fire:
- [x] We have released all codes of AD-PT here, including: 1) pre-training and fine-tuning methods, 2) labeled and pseudo-labeled data, and 3) pre-trained checkpoints for fine-tuning. Please see AD-PT for more technical details (updated on Sep. 2023).
- [x] SPOT shows that occupancy prediction is a promising pre-training method for general and scalable 3D representation learning, and see Figure 1 of SPOT paper for the inspiring experiment results (updated on Sep. 2023).
- [x] We have released the Reconstruction-Simulation Dataset obtained using the ReSimAD method (updated on Sep. 2023).
- [x] We have released the AD-PT pre-trained checkpoints, see AD-PT pre-trained checkpoints for pre-trained checkpoints (updated on Aug. 2023).
- [x] Based on
3DTrans
, we achieved significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint, PV-RCNN (updated on Jun. 2023). - [x] Our
3DTrans
supported the Semi-Supervised Domain Adaptation (SSDA) for 3D Object Detection (updated on Nov. 2022). - [x] Our
3DTrans
supported the Active Domain Adaptation (ADA) of 3D Object Detection for achieving a good trade-off between high performance and annotation cost (updated on Oct. 2022). - [x] Our
3DTrans
supported several typical transfer learning techniques (such as TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher) for autonomous driving-related model adaptation and transfer. - [x] Our
3DTrans
supported the Multi-domain Dataset Fusion (MDF) of 3D Object Detection for enabling the existing 3D models to effectively learn from multiple off-the-shelf 3D datasets (updated on Sep. 2022). - [x] Our
3DTrans
supported the Unsupervised Domain Adaptation (UDA) of 3D Object Detection for deploying a well-trained source model to an unlabeled target domain (updated on July 2022). - [x] We calculate the distribution of the object-size for each public AD dataset in object-size statistics
We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. :tokyo_tower:
Installation for 3DTrans
You may refer to INSTALL.md for the installation of 3DTrans
.
Getting Started
Getting Started for ALL Settings
-
Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.
-
Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.
-
Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.
-
Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.
-
Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.
-
Please refer to Readme for ReSimAD for ReSimAD implementation.
-
Please refer to Readme for AD-PT Pre-training for starting the journey of 3D perception pre-training using AD-PT.
-
Please refer to Readme for PointContrast Pre-training for 3D perception pre-training using PointContrast.
Model Zoo
We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.
Domain Transfer Results
UDA Results
Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.
- All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
- For Waymo dataset training, we train the model using 20% data.
- The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
- Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
- Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.
training time | Adaptation | Car@R40 | download | |
---|---|---|---|---|
PointPillar | ~7.1 hours | Source-only with SN | 74.98 / 49.31 | - |
PointPillar | ~0.6 hours | Pre-SN | 81.71 / 57.11 | model-57M |
PV-RCNN | ~23 hours | Source-only with SN | 69.92 / 60.17 | - |
PV-RCNN | ~23 hours | Source-only | 74.42 / 40.35 | - |
PV-RCNN | ~3.5 hours | Pre-SN | 84.00 / 74.57 | model-156M |
PV-RCNN | ~1 hours | Post-SN | 84.94 / 75.20 | model-156M |
Voxel R-CNN | ~16 hours | Source-only with SN | 75.83 / 55.50 | - |
Voxel R-CNN | ~16 hours | Source-only | 64.88 / 19.90 | - |
Voxel R-CNN | ~2.5 hours | Pre-SN | 82.56 / 67.32 | model-201M |
Voxel R-CNN | ~2.2 hours | Post-SN | 85.44 / 76.78 | model-201M |
PV-RCNN++ | ~20 hours | Source-only with SN | 67.22 / 56.50 | - |
PV-RCNN++ | ~20 hours | Source-only | 67.68 / 20.82 | - |
PV-RCNN++ | ~2.2 hours | Post-SN | 86.86 / 79.86 | model-193M |
ADA Results
Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.
- All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
- For Waymo dataset training, we train the model using 20% data.
- The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
training time | Adaptation | Car@R40 | download | |
---|---|---|---|---|
PV-RCNN | ~23h@4 A100 | Source Only | 67.95 / 27.65 | - |
PV-RCNN | ~1.5h@2 A100 | Bi3D (1% annotation budget) | 87.12 / 78.03 | Model-58M |
PV-RCNN | ~10h@2 A100 | Bi3D (5% annotation budget) | 89.53 / 81.32 | Model-58M |
PV-RCNN | ~1.5h@2 A100 | TQS | 82.00 / 72.04 | Model-58M |
PV-RCNN | ~1.5h@2 A100 | CLUE | 82.13 / 73.14 | Model-50M |
PV-RCNN | ~10h@2 A100 | Bi3D+ST3D | 87.83 / 81.23 | Model-58M |
Voxel R-CNN | ~16h@4 A100 | Source Only | 64.87 / 19.90 | - |
Voxel R-CNN | ~1.5h@2 A100 | Bi3D (1% annotation budget) | 88.09 / 79.14 | Model-72M |
Voxel R-CNN | ~6h@2 A100 | Bi3D (5% annotation budget) | 90.18 / 81.34 | Model-72M |
Voxel R-CNN | ~1.5h@2 A100 | TQS | 78.26 / 67.11 | Model-72M |
Voxel R-CNN | ~1.5h@2 A100 | CLUE | 81.93 / 70.89 | Model-72M |
SSDA Results
We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.
- The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
- For Waymo dataset training, we train the model using 20% data.
- second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
- second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
- second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.
training time | Adaptation | Car@R40 | download | |
---|---|---|---|---|
Second | ~11 hours | source-only(Waymo) | 27.85 / 16.43 | - |
Second | ~0.4 hours | second_5%_FT | 45.95 / 26.98 | model-61M |
Second | ~1.8 hours | second_5%_SESS | 47.77 / 28.74 | model-61M |
Second | ~1.7 hours | second_5%_PS | 47.72 / 29.37 | model-61M |
PV-RCNN | ~24 hours | source-only(Waymo) | 40.31 / 23.32 | - |
PV-RCNN | ~1.0 hours | pvrcnn_5%_FT | 49.58 / 34.86 | model-150M |
PV-RCNN | ~5.5 hours | pvrcnn_5%_SESS | 49.92 / 35.28 | model-150M |
PV-RCNN | ~5.4 hours | pvrcnn_5%_PS | 49.84 / 35.07 | model-150M |
PV-RCNN++ | ~16 hours | source-only(Waymo) | 31.96 / 19.81 | - |
PV-RCNN++ | ~1.2 hours | pvplus_5%_FT | 49.94 / 34.28 | model-185M |
PV-RCNN++ | ~4.2 hours | pvplus_5%_SESS | 51.14 / 35.25 | model-185M |
PV-RCNN++ | ~3.6 hours | pvplus_5%_PS | 50.84 / 35.39 | model-185M |
- For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
- PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
- SESS denotes that we utilize the SESS method to adapt the baseline.
- For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.
Training ONCE Data | Methods | Vehicle@AP | Pedestrian@AP | Cyclist@AP | download | |
---|---|---|---|---|---|---|
Centerpoint | Labeled (4K) | Train from scracth | 74.93 | 46.21 | 67.36 | model-96M |
Centerpoint_Pede | Labeled (4K) | PS | - | 49.14 | - | model-96M |
PV-RCNN++ | Labeled (4K) | Train from scracth | 79.78 | 35.91 | 63.18 | model-188M |
PV-RCNN++ | Small Dataset (100K) | SESS | 80.02 | 46.24 | 66.41 | model-188M |
MDF Results
Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.
- All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
- The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
- For Waymo dataset training, we train the model using 20% training data for saving training time.
- PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.
Baseline | MDF Methods | Waymo@Vehicle | Waymo@Pedestrian | Waymo@Cyclist | nuScenes@Car | nuScenes@Pedestrian | nuScenes@Cyclist |
---|---|---|---|---|---|---|---|
PV-RCNN-nuScenes | only nuScenes | 35.59 / 35.21 | 3.95 / 2.55 | 0.94 / 0.92 | 57.78 / 41.10 | 24.52 / 18.56 | 10.24 / 8.25 |
PV-RCNN-Waymo | only Waymo | 66.49 / 66.01 | 64.09 / 58.06 | 62.09 / 61.02 | 32.99 / 17.55 | 3.34 / 1.94 | 0.02 / 0.01 |
PV-RCNN-DM | Direct Merging | 57.82 / 57.40 | 48.24 / 42.81 | 54.63 / 53.64 | 48.67 / 30.43 | 12.66 / 8.12 | 1.67 / 1.04 |
PV-RCNN-Uni3D | Uni3D | 66.98 / 66.50 | 65.70 / 59.14 | 61.49 / 60.43 | 60.77 / 42.66 | 27.44 / 21.85 | 13.50 / 11.87 |
PV-RCNN-DT | Domain Attention | 67.27 / 66.77 | 65.86 / 59.38 | 61.38 / 60.34 | 60.83 / 43.03 | 27.46 / 22.06 | 13.82 / 11.52 |
Baseline | MDF Methods | Waymo@Vehicle | Waymo@Pedestrian | Waymo@Cyclist | nuScenes@Car | nuScenes@Pedestrian | nuScenes@Cyclist |
---|---|---|---|---|---|---|---|
Voxel-RCNN-nuScenes | only nuScenes | 31.89 / 31.65 | 3.74 / 2.57 | 2.41 / 2.37 | 53.63 / 39.05 | 22.48 / 17.85 | 10.86 / 9.70 |
Voxel-RCNN-Waymo | only Waymo | 67.05 / 66.41 | 66.75 / 60.83 | 63.13 / 62.15 | 34.10 / 17.31 | 2.99 / 1.69 | 0.05 / 0.01 |
Voxel-RCNN-DM | Direct Merging | 58.26 / 57.87 | 52.72 / 47.11 | 50.26 / 49.50 | 51.40 / 31.68 | 15.04 / 9.99 | 5.40 / 3.87 |
Voxel-RCNN-Uni3D | Uni3D | 66.76 / 66.29 | 66.62 / 60.51 | 63.36 / 62.42 | 60.18 / 42.23 | 30.08 / 24.37 | 14.60 / 12.32 |
Voxel-RCNN-DT | Domain Attention | 66.96 / 66.50 | 68.23 / 62.00 | 62.57 / 61.64 | 60.42 / 42.81 | 30.49 / 24.92 | 15.91 / 13.35 |
Baseline | MDF Methods | Waymo@Vehicle | Waymo@Pedestrian | Waymo@Cyclist | nuScenes@Car | nuScenes@Pedestrian | nuScenes@Cyclist |
---|---|---|---|---|---|---|---|
PV-RCNN++ DM | Direct Merging | 63.79 / 63.38 | 55.03 / 49.75 | 59.88 / 58.99 | 50.91 / 31.46 | 17.07 / 12.15 | 3.10 / 2.20 |
PV-RCNN++-Uni3D | Uni3D | 68.55 / 68.08 | 69.83 / 63.60 | 64.90 / 63.91 | 62.51 / 44.16 | 33.82 / 27.18 | 22.48 / 19.30 |
PV-RCNN++-DT | Domain Attention | 68.51 / 68.05 | 69.81 / 63.58 | 64.39 / 63.43 | 62.33 / 44.16 | 33.44 / 26.94 | 21.64 / 18.52 |
3D Pre-training Results
AD-PT Results on Waymo
AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.
Data amount | Overall | Vehicle | Pedestrian | Cyclist | |
---|---|---|---|---|---|
SECOND (From scratch) | 3% | 52.00 / 37.70 | 58.11 / 57.44 | 51.34 / 27.38 | 46.57 / 28.28 |
SECOND (AD-PT) | 3% | 55.41 / 51.78 | 60.53 / 59.93 | 54.91 / 45.78 | 50.79 / 49.65 |
SECOND (From scratch) | 20% | 60.62 / 56.86 | 64.26 / 63.73 | 59.72 / 50.38 | 57.87 / 56.48 |
SECOND (AD-PT) | 20% | 61.26 / 57.69 | 64.54 / 64.00 | 60.25 / 51.21 | 59.00 / 57.86 |
CenterPoint (From scratch) | 3% | 59.00 / 56.29 | 57.12 / 56.57 | 58.66 / 52.44 | 61.24 / 59.89 |
CenterPoint (AD-PT) | 3% | 61.21 / 58.46 | 60.35 / 59.79 | 60.57 / 54.02 | 62.73 / 61.57 |
CenterPoint (From scratch) | 20% | 66.47 / 64.01 | 64.91 / 64.42 | 66.03 / 60.34 | 68.49 / 67.28 |
CenterPoint (AD-PT) | 20% | 67.17 / 64.65 | 65.33 / 64.83 | 67.16 / 61.20 | 69.39 / 68.25 |
PV-RCNN++ (From scratch) | 3% | 63.81 / 61.10 | 64.42 / 63.93 | 64.33 / 57.79 | 62.69 / 61.59 |
PV-RCNN++ (AD-PT) | 3% | 68.33 / 65.69 | 68.17 / 67.70 | 68.82 / 62.39 | 68.00 / 67.00 |
PV-RCNN++ (From scratch) | 20% | 69.97 / 67.58 | 69.18 / 68.75 | 70.88 / 65.21 | 69.84 / 68.77 |
PV-RCNN++ (AD-PT) | 20% | 71.55 / 69.23 | 70.62 / 70.19 | 72.36 / 66.82 | 71.69 / 70.70 |
ReSimAD
ReSimAD Implementation
Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.
Specifically, please refer to ReSimAD reconstruction for the point-based reconstruction meshes, and PCSim for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.
We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.
Methods | training time | Adaptation | Car@R40 | Ckpt |
---|---|---|---|---|
PV-RCNN | ~23 hours | Source-only | 31.02 / 17.75 | Not Avaliable (Waymo License) |
PV-RCNN | ~8 hours | ST3D | 36.42 / 22.99 | - |
PV-RCNN | ~8 hours | ReSimAD | 37.85 / 21.33 | ReSimAD_ckpt |
PV-RCNN++ | ~20 hours | Source-only | 29.93 / 18.77 | Not Avaliable (Waymo License) |
PV-RCNN++ | ~2.2 hours | ST3D | 34.68 / 17.17 | - |
PV-RCNN++ | ~8 hours | ReSimAD | 40.73 / 23.72 | ReSimAD_ckpt |
Visualization Tools for 3DTrans
- Our
3DTrans
supports the sequence-level visualization function Quick Sequence Demo to continuously display the prediction results of ground truth of a selected scene.
Visualization Demo
-
Waymo Sequence-level Visualization Demo1
-
Waymo Sequence-level Visualization Demo2
-
nuScenes Sequence-level Visualization Demo
-
ONCE Sequence-level Visualization Demo
Acknowledge
- Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.
- A Team Home for Member Information and Profile, Project Link
Technical Papers
@inproceedings{zhang2023uni3d,
title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={9253--9262},
year={2023}
}
@inproceedings{yuan2023bi3d,
title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={15599--15608},
year={2023}
}
@inproceedings{yuan2023AD-PT,
title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}
@inproceedings{huang2023sug,
title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
year={2023}
}
@inproceedings{zhang2023resimad,
title={ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation},
author={Zhang, Bo and Cai, Xinyu and Yuan, Jiakang and Yang, Donglin and Guo, Jianfei and Xia, Renqiu and Shi, Botian and Dou, Min and Chen, Tao and Liu, Si and others},
journal={International Conference on Learning Representations},
year={2024}
}
@article{yan2023spot,
title={SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving},
author={Yan, Xiangchao and Chen, Runjian and Zhang, Bo and Yuan, Jiakang and Cai, Xinyu and Shi, Botian and Shao, Wenqi and Yan, Junchi and Luo, Ping and Qiao, Yu},
journal={arXiv preprint arXiv:2309.10527},
year={2023}
}