PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

Runyu Ding^1*, Jihan Yang^1*, Chuhui Xue², Wenqing Zhang², Song Bai^2†, Xiaojuan Qi^1†,

¹The University of Hong Kong ²ByteDance

*equal contribution ⁺corresponding author

CVPR 2023

TL;DR: PLA leverages powerful VL foundation models to construct hierarchical 3D-text pairs for 3D open-world learning.


working space	piano	vending machine

project page | arXiv

TODO

[ ] Release caption processing code

Getting Started

Installation

Please refer to INSTALL.md for the installation.

Dataset Preparation

Please refer to DATASET.md for dataset preparation.

Training & Inference

Please refer to MODEL.md for training and inference scripts and pretrained models.

Citation

If you find this project useful in your research, please consider cite:

@inproceedings{ding2022language,
    title={PLA: Language-Driven Open-Vocabulary 3D Scene Understanding},
    author={Ding, Runyu and Yang, Jihan and Xue, Chuhui and Zhang, Wenqing and Bai, Song and Qi, Xiaojuan},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2023}
}

Acknowledgement

Code is partly borrowed from OpenPCDet, PointGroup and SoftGroup.

PLA
PLA copied to clipboard

Metadata

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

TODO

Getting Started

Installation

Dataset Preparation

Training & Inference

Citation

Acknowledgement

← Metadata

Owner

Metadata

PLA PLA copied to clipboard

Metadata

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

TODO

Getting Started

Installation

Dataset Preparation

Training & Inference

Citation

Acknowledgement

← Metadata

Owner

Metadata

PLA
PLA copied to clipboard