Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha$^1$, Pete Florence$^2$, Shuran Song$^1$

$^1$ Columbia University, $^2$ Google DeepMind

Scaling Up and Distilling Down is a framework for language-guided skill learning. Give it a task description, and it will automatically generate rich, diverse robot trajectories, complete with success label and dense language labels.

The best part? It uses no expert demonstrations, manual reward supervision, and no manual language annotation.

This repository contains code for language-guided data generation and language-conditioned diffusion policy training for Scaling Up And Distilling Down. It has been tested on Ubuntu 18.04, 20.04 and 22.04, NVIDIA GTX 1080, NVIDIA RTX A6000, NVIDIA GeForce RTX 3080, and NVIDIA GeForce RTX 3090.

If you find this codebase useful, consider citing:

@inproceedings{ha2023scalingup,
      title={Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition}, 
      author={Huy Ha and Pete Florence and Shuran Song},
      year={2023},
      eprint={2307.14535},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

If you have any questions, please contact me at huy [at] cs [dot] columbia [dot] edu.

Table of Contents

⚙️Setup
🚶Codebase Walkthrough
- 💡 Core Concepts
  - 🪺 Nested Trajectories using Hierachical Actions and Policies
  - 🌳 Exploration Task Tree
  - 🌈 Seeded Variation
- 🎛️ Control
- 🏃‍♂️ Motion Planning
- 🗣️ Language Model Queries
  - 💿 Cache
  - 🔗 Linking LLM Modules Together
  - 🪙 Coin flips
🔬 Reproducing
- 📊 Evaluation
- 🗄️ Data Generation
- 🧠 Training
🔭 Extending
- I want to add more
  - 🤖 Robots
  - 🪑 Assets
    - 📜 Tools & Scripts
  - 🌏 Environments & Tasks
    - New Simulators
    - New Tasks
  - 🦙 Language Models
- 🖼️ Figure Utilities
  - Efficiency Plot
  - Visualizing Language-conditioned Outputs
- ✅ Development Tips
  - 🐉 Hydra
  - 📷 Headless Rendering
  - 🖧 Multi-processing
  - 👩‍👦‍👶 Typing
  - ⏱️ Profiling
  - 💽 Data Format
  - 💾 RAM Usage
- 💀 Known Issues
- ✅ Training Tips
  - Mixed-Precision
📽️ Visualizations

Acknowledgements

We would like to thank Cheng Chi, Zeyi Liu, Samir Yitzhak Gadre, Mengda Xu, Zhenjia Xu, Mandi Zhao and Dominik Bauer for their helpful feedback and fruitful discussions.

This work was supported in part by Google Research Award, NSF Award #2143601, and #2132519. We would like to thank Google for the UR5 robot hardware. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

Code

Diffusion Policy: The policy was built on top of their Colab, and our real-world evaluation code was modified from theirs
Mujoco Menagerie: UR5 and RealSense models were modified from their models.
Mujoco Scanned Objects: Big shout out to Kevin Zakka for all his amazing open-source work, go give him a few stars ⭐
3D UNet Implementation modified from Adrian Wolny's implementation.
Support for Fair Innovation FR5 with the Robotiq 852F and Weiss WSG50 was contributed by Yan Wang! Go give him a few stars as well!

scalingup
scalingup copied to clipboard

Metadata

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Acknowledgements

Code

← Metadata

Owner

Metadata

scalingup scalingup copied to clipboard

Metadata

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Acknowledgements

Code

← Metadata

Owner

Metadata

scalingup
scalingup copied to clipboard