3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

3DTrans includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.

We implement the Transfer Learning Techniques consisting of four functions:

Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
Active Domain Adaptation (ADA) for 3D Point Clouds
Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
Multi-dateset Domain Fusion (MDF) for 3D Point Clouds

We implement the Scalable Pre-training which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:

Overview

News
Installation for 3DTrans
Getting Started
Transfer Learning Techniques@3DTrans
- Model Zoo:
  - Domain Transfer Results
Scalable Pre-training Techniques@3DTrans
- Model Zoo:
  - AD-PT Results
  - ReSimAD
Visualization Tools for 3DTrans
3DTrans Framework Introduction
Acknowledge
Citation

News :fire:

[x] We have released all codes of AD-PT here, including: 1) pre-training and fine-tuning methods, 2) labeled and pseudo-labeled data, and 3) pre-trained checkpoints for fine-tuning. Please see AD-PT for more technical details (updated on Sep. 2023).
[x] SPOT shows that occupancy prediction is a promising pre-training method for general and scalable 3D representation learning, and see Figure 1 of SPOT paper for the inspiring experiment results (updated on Sep. 2023).
[x] We have released the Reconstruction-Simulation Dataset obtained using the ReSimAD method (updated on Sep. 2023).
[x] We have released the AD-PT pre-trained checkpoints, see AD-PT pre-trained checkpoints for pre-trained checkpoints (updated on Aug. 2023).
[x] Based on 3DTrans, we achieved significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint, PV-RCNN (updated on Jun. 2023).
[x] Our 3DTrans supported the Semi-Supervised Domain Adaptation (SSDA) for 3D Object Detection (updated on Nov. 2022).
[x] Our 3DTrans supported the Active Domain Adaptation (ADA) of 3D Object Detection for achieving a good trade-off between high performance and annotation cost (updated on Oct. 2022).
[x] Our 3DTrans supported several typical transfer learning techniques (such as TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher) for autonomous driving-related model adaptation and transfer.
[x] Our 3DTrans supported the Multi-domain Dataset Fusion (MDF) of 3D Object Detection for enabling the existing 3D models to effectively learn from multiple off-the-shelf 3D datasets (updated on Sep. 2022).
[x] Our 3DTrans supported the Unsupervised Domain Adaptation (UDA) of 3D Object Detection for deploying a well-trained source model to an unlabeled target domain (updated on July 2022).
[x] We calculate the distribution of the object-size for each public AD dataset in object-size statistics

We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. :tokyo_tower:

Installation for 3DTrans

You may refer to INSTALL.md for the installation of 3DTrans.

Getting Started

Getting Started for ALL Settings

Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.
Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.
Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.
Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.
Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.
Please refer to Readme for ReSimAD for ReSimAD implementation.
Please refer to Readme for AD-PT Pre-training for starting the journey of 3D perception pre-training using AD-PT.
Please refer to Readme for PointContrast Pre-training for 3D perception pre-training using PointContrast.

Model Zoo

We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.

Domain Transfer Results

UDA Results

Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.

All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
For Waymo dataset training, we train the model using 20% data.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.

	training time	Adaptation	Car@R40	download
PointPillar	~7.1 hours	Source-only with SN	74.98 / 49.31	-
PointPillar	~0.6 hours	Pre-SN	81.71 / 57.11	model-57M
PV-RCNN	~23 hours	Source-only with SN	69.92 / 60.17	-
PV-RCNN	~23 hours	Source-only	74.42 / 40.35	-
PV-RCNN	~3.5 hours	Pre-SN	84.00 / 74.57	model-156M
PV-RCNN	~1 hours	Post-SN	84.94 / 75.20	model-156M
Voxel R-CNN	~16 hours	Source-only with SN	75.83 / 55.50	-
Voxel R-CNN	~16 hours	Source-only	64.88 / 19.90	-
Voxel R-CNN	~2.5 hours	Pre-SN	82.56 / 67.32	model-201M
Voxel R-CNN	~2.2 hours	Post-SN	85.44 / 76.78	model-201M
PV-RCNN++	~20 hours	Source-only with SN	67.22 / 56.50	-
PV-RCNN++	~20 hours	Source-only	67.68 / 20.82	-
PV-RCNN++	~2.2 hours	Post-SN	86.86 / 79.86	model-193M

ADA Results

Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.

All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
For Waymo dataset training, we train the model using 20% data.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.

	training time	Adaptation	Car@R40	download
PV-RCNN	~23h@4 A100	Source Only	67.95 / 27.65	-
PV-RCNN	~1.5h@2 A100	Bi3D (1% annotation budget)	87.12 / 78.03	Model-58M
PV-RCNN	~10h@2 A100	Bi3D (5% annotation budget)	89.53 / 81.32	Model-58M
PV-RCNN	~1.5h@2 A100	TQS	82.00 / 72.04	Model-58M
PV-RCNN	~1.5h@2 A100	CLUE	82.13 / 73.14	Model-50M
PV-RCNN	~10h@2 A100	Bi3D+ST3D	87.83 / 81.23	Model-58M
Voxel R-CNN	~16h@4 A100	Source Only	64.87 / 19.90	-
Voxel R-CNN	~1.5h@2 A100	Bi3D (1% annotation budget)	88.09 / 79.14	Model-72M
Voxel R-CNN	~6h@2 A100	Bi3D (5% annotation budget)	90.18 / 81.34	Model-72M
Voxel R-CNN	~1.5h@2 A100	TQS	78.26 / 67.11	Model-72M
Voxel R-CNN	~1.5h@2 A100	CLUE	81.93 / 70.89	Model-72M

SSDA Results

We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.

The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
For Waymo dataset training, we train the model using 20% data.
second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.

	training time	Adaptation	Car@R40	download
Second	~11 hours	source-only(Waymo)	27.85 / 16.43	-
Second	~0.4 hours	second_5%_FT	45.95 / 26.98	model-61M
Second	~1.8 hours	second_5%_SESS	47.77 / 28.74	model-61M
Second	~1.7 hours	second_5%_PS	47.72 / 29.37	model-61M
PV-RCNN	~24 hours	source-only(Waymo)	40.31 / 23.32	-
PV-RCNN	~1.0 hours	pvrcnn_5%_FT	49.58 / 34.86	model-150M
PV-RCNN	~5.5 hours	pvrcnn_5%_SESS	49.92 / 35.28	model-150M
PV-RCNN	~5.4 hours	pvrcnn_5%_PS	49.84 / 35.07	model-150M
PV-RCNN++	~16 hours	source-only(Waymo)	31.96 / 19.81	-
PV-RCNN++	~1.2 hours	pvplus_5%_FT	49.94 / 34.28	model-185M
PV-RCNN++	~4.2 hours	pvplus_5%_SESS	51.14 / 35.25	model-185M
PV-RCNN++	~3.6 hours	pvplus_5%_PS	50.84 / 35.39	model-185M

For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
SESS denotes that we utilize the SESS method to adapt the baseline.
For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.

	Training ONCE Data	Methods	Vehicle@AP	Pedestrian@AP	Cyclist@AP	download
Centerpoint	Labeled (4K)	Train from scracth	74.93	46.21	67.36	model-96M
Centerpoint_Pede	Labeled (4K)	PS	-	49.14	-	model-96M
PV-RCNN++	Labeled (4K)	Train from scracth	79.78	35.91	63.18	model-188M
PV-RCNN++	Small Dataset (100K)	SESS	80.02	46.24	66.41	model-188M

MDF Results

Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.

All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
For Waymo dataset training, we train the model using 20% training data for saving training time.
PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
PV-RCNN-nuScenes	only nuScenes	35.59 / 35.21	3.95 / 2.55	0.94 / 0.92	57.78 / 41.10	24.52 / 18.56	10.24 / 8.25
PV-RCNN-Waymo	only Waymo	66.49 / 66.01	64.09 / 58.06	62.09 / 61.02	32.99 / 17.55	3.34 / 1.94	0.02 / 0.01
PV-RCNN-DM	Direct Merging	57.82 / 57.40	48.24 / 42.81	54.63 / 53.64	48.67 / 30.43	12.66 / 8.12	1.67 / 1.04
PV-RCNN-Uni3D	Uni3D	66.98 / 66.50	65.70 / 59.14	61.49 / 60.43	60.77 / 42.66	27.44 / 21.85	13.50 / 11.87
PV-RCNN-DT	Domain Attention	67.27 / 66.77	65.86 / 59.38	61.38 / 60.34	60.83 / 43.03	27.46 / 22.06	13.82 / 11.52

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
Voxel-RCNN-nuScenes	only nuScenes	31.89 / 31.65	3.74 / 2.57	2.41 / 2.37	53.63 / 39.05	22.48 / 17.85	10.86 / 9.70
Voxel-RCNN-Waymo	only Waymo	67.05 / 66.41	66.75 / 60.83	63.13 / 62.15	34.10 / 17.31	2.99 / 1.69	0.05 / 0.01
Voxel-RCNN-DM	Direct Merging	58.26 / 57.87	52.72 / 47.11	50.26 / 49.50	51.40 / 31.68	15.04 / 9.99	5.40 / 3.87
Voxel-RCNN-Uni3D	Uni3D	66.76 / 66.29	66.62 / 60.51	63.36 / 62.42	60.18 / 42.23	30.08 / 24.37	14.60 / 12.32
Voxel-RCNN-DT	Domain Attention	66.96 / 66.50	68.23 / 62.00	62.57 / 61.64	60.42 / 42.81	30.49 / 24.92	15.91 / 13.35

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
PV-RCNN++ DM	Direct Merging	63.79 / 63.38	55.03 / 49.75	59.88 / 58.99	50.91 / 31.46	17.07 / 12.15	3.10 / 2.20
PV-RCNN++-Uni3D	Uni3D	68.55 / 68.08	69.83 / 63.60	64.90 / 63.91	62.51 / 44.16	33.82 / 27.18	22.48 / 19.30
PV-RCNN++-DT	Domain Attention	68.51 / 68.05	69.81 / 63.58	64.39 / 63.43	62.33 / 44.16	33.44 / 26.94	21.64 / 18.52

3D Pre-training Results

AD-PT Results on Waymo

AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.

	Data amount	Overall	Vehicle	Pedestrian	Cyclist
SECOND (From scratch)	3%	52.00 / 37.70	58.11 / 57.44	51.34 / 27.38	46.57 / 28.28
SECOND (AD-PT)	3%	55.41 / 51.78	60.53 / 59.93	54.91 / 45.78	50.79 / 49.65
SECOND (From scratch)	20%	60.62 / 56.86	64.26 / 63.73	59.72 / 50.38	57.87 / 56.48
SECOND (AD-PT)	20%	61.26 / 57.69	64.54 / 64.00	60.25 / 51.21	59.00 / 57.86
CenterPoint (From scratch)	3%	59.00 / 56.29	57.12 / 56.57	58.66 / 52.44	61.24 / 59.89
CenterPoint (AD-PT)	3%	61.21 / 58.46	60.35 / 59.79	60.57 / 54.02	62.73 / 61.57
CenterPoint (From scratch)	20%	66.47 / 64.01	64.91 / 64.42	66.03 / 60.34	68.49 / 67.28
CenterPoint (AD-PT)	20%	67.17 / 64.65	65.33 / 64.83	67.16 / 61.20	69.39 / 68.25
PV-RCNN++ (From scratch)	3%	63.81 / 61.10	64.42 / 63.93	64.33 / 57.79	62.69 / 61.59
PV-RCNN++ (AD-PT)	3%	68.33 / 65.69	68.17 / 67.70	68.82 / 62.39	68.00 / 67.00
PV-RCNN++ (From scratch)	20%	69.97 / 67.58	69.18 / 68.75	70.88 / 65.21	69.84 / 68.77
PV-RCNN++ (AD-PT)	20%	71.55 / 69.23	70.62 / 70.19	72.36 / 66.82	71.69 / 70.70

ReSimAD

ReSimAD Implementation

Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.

Specifically, please refer to ReSimAD reconstruction for the point-based reconstruction meshes, and PCSim for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.

We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.

Methods	training time	Adaptation	Car@R40	Ckpt
PV-RCNN	~23 hours	Source-only	31.02 / 17.75	Not Avaliable (Waymo License)
PV-RCNN	~8 hours	ST3D	36.42 / 22.99	-
PV-RCNN	~8 hours	ReSimAD	37.85 / 21.33	ReSimAD_ckpt
PV-RCNN++	~20 hours	Source-only	29.93 / 18.77	Not Avaliable (Waymo License)
PV-RCNN++	~2.2 hours	ST3D	34.68 / 17.17	-
PV-RCNN++	~8 hours	ReSimAD	40.73 / 23.72	ReSimAD_ckpt

Visualization Tools for 3DTrans

Our 3DTrans supports the sequence-level visualization function Quick Sequence Demo to continuously display the prediction results of ground truth of a selected scene.

Visualization Demo

Waymo Sequence-level Visualization Demo1
Waymo Sequence-level Visualization Demo2
nuScenes Sequence-level Visualization Demo
ONCE Sequence-level Visualization Demo

Acknowledge

Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.

A Team Home for Member Information and Profile, Project Link

Technical Papers

@inproceedings{zhang2023uni3d,
  title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
  author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9253--9262},
  year={2023}
}

@inproceedings{yuan2023bi3d,
  title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15599--15608},
  year={2023}
}

@inproceedings{yuan2023AD-PT,
  title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

@inproceedings{huang2023sug,
  title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
  author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
  booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
  year={2023}
}

@inproceedings{zhang2023resimad,
  title={ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation},
  author={Zhang, Bo and Cai, Xinyu and Yuan, Jiakang and Yang, Donglin and Guo, Jianfei and Xia, Renqiu and Shi, Botian and Dou, Min and Chen, Tao and Liu, Si and others},
  journal={International Conference on Learning Representations},
  year={2024}
}

@article{yan2023spot,
  title={SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving},
  author={Yan, Xiangchao and Chen, Runjian and Zhang, Bo and Yuan, Jiakang and Cai, Xinyu and Shi, Botian and Shao, Wenqi and Yan, Junchi and Luo, Ping and Qiao, Yu},
  journal={arXiv preprint arXiv:2309.10527},
  year={2023}
}

3DTrans
3DTrans copied to clipboard

Metadata

3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

Overview

News :fire:

Installation for 3DTrans

Getting Started

Model Zoo

Domain Transfer Results

3D Pre-training Results

ReSimAD

Visualization Tools for 3DTrans

Acknowledge

Technical Papers

← Metadata

Owner

Metadata

3DTrans 3DTrans copied to clipboard

Metadata

3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

Overview

News :fire:

Installation for 3DTrans

Getting Started

Model Zoo

Domain Transfer Results

3D Pre-training Results

ReSimAD

Visualization Tools for 3DTrans

Acknowledge

Technical Papers

← Metadata

Owner

Metadata

3DTrans
3DTrans copied to clipboard