Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Zeqi Xiao Tai Wang Jingbo Wang Jinkun Cao Wenwei Zhang Bo Dai Dahua Lin Jiangmiao Pang*
Shanghai AI Laboratory Nanyang Technological University Carnegie Mellon University

🏠 About

This paper presents a UNIfied HSI framework, UniHSI, which supports unified control of diverse interactions through language commands. This framework is built upon the definition of interaction as Chain of Contacts (CoC): steps of human joint-object part pairs, which is inspired by the strong correlation between interaction types and human-object contact regions. Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution. To facilitate training and evaluation, we collect a new dataset named ScenePlan that encompasses thousands of task plans generated by LLMs based on diverse scenarios. Comprehensive experiments demonstrate the effectiveness of our framework in versatile task execution and generalizability to real scanned scenes.

🔥 News

[2024-04] The data is released.
[2024-03] The code is released.
[2024-01] UniHSI is accepted as ICLR 2024 spotlight. Thanks for the recognition!
[2023-09] We release the paper of UniHSI. Please check the :point_right: webpage :point_left: and view our demos! :sparkler:;

🔍 Overview

The whole pipeline consists of two major components: the LLM Planner and the Unified Controller. The LLM planner takes language inputs and background scenario information as inputs and outputs multi-step plan in the form of a Chain of Contacts. The Unified Controller then executes task plans step-by-step and output interaction movements.

Installation

Download Isaac Gym from the website, then follow the installation instructions.

Once Isaac Gym is installed, install the external dependencies for this repo:

pip install -r requirements.txt

Data Preparation

PartNet

Download PartNet and ShapeNet V2.
Save them in the following formation

data/
├── partnet_origin
│   ├── obj_id1
│   ├── obj_id2
│   ├── ...
├── shapenet_origin
│   ├── class_id1
│   │    ├── obj_id1
│   │    ├── ...
│   ├── class_id2
│   │    ├── obj_id1
│   │    ├── ...
│   ├── ...

Extract the objects used in sceneplan by

python cp_partnet_train.py
python cp_partnet_test.py

ScanNet

Download ScanNet.
Save them in the following formation

data/
├── scan_origin
│   ├── scans
│   │   ├── scans_1
│   │   ├── scans_2
│   │   ├── ...

Extract the objects used in sceneplan by

python cp_scannet_test.py

Motio Clips

We select and process motion clips from SAMP and CIRCLE.

Training

We adopt step-by-step training.

sh train_partnet_simple.sh
sh train_partnet_mid.sh
sh train_partnet_hard.sh

Demo

sh demo_scannet.sh

🔗 Citation

If you find our work helpful, please cite:

@inproceedings{
  xiao2024unified,
  title={Unified Human-Scene Interaction via Prompted Chain-of-Contacts},
  author={Zeqi Xiao and Tai Wang and Jingbo Wang and Jinkun Cao and Wenwei Zhang and Bo Dai and Dahua Lin and Jiangmiao Pang},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=1vCnDyQkjg}
}

📄 License

This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

👏 Acknowledgements

ASE: Our codebase is built upon the AMP implementation in ASE.
PartNet and ShapeNet.: We use objects from PartNet for training and evaluation.
ScanNet: We use scenarios from ScanNet for evaluation.
SAMP: We use motion clips from SAMP for training.
CIRCLE: We use motion clips from CIRCLE for training.

UniHSI
UniHSI copied to clipboard

Metadata

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

🏠 About

🔥 News

🔍 Overview

Installation

Data Preparation

PartNet

ScanNet

Motio Clips

Training

Demo

🔗 Citation

📄 License

👏 Acknowledgements

← Metadata

Owner

Metadata

UniHSI UniHSI copied to clipboard

Metadata

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

🏠 About

🔥 News

🔍 Overview

Installation

Data Preparation

PartNet

ScanNet

Motio Clips

Training

Demo

🔗 Citation

📄 License

👏 Acknowledgements

← Metadata

Owner

Metadata

UniHSI
UniHSI copied to clipboard