scalingup icon indicating copy to clipboard operation
scalingup copied to clipboard

[CoRL 2023] This repository contains data generation and training code for Scaling Up & Distilling Down

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha$^1$, Pete Florence$^2$, Shuran Song$^1$

$^1$ Columbia University, $^2$ Google DeepMind

Project Page | Arxiv | Video

Scaling Up and Distilling Down is a framework for language-guided skill learning. Give it a task description, and it will automatically generate rich, diverse robot trajectories, complete with success label and dense language labels.

The best part? It uses no expert demonstrations, manual reward supervision, and no manual language annotation.


This repository contains code for language-guided data generation and language-conditioned diffusion policy training for Scaling Up And Distilling Down. It has been tested on Ubuntu 18.04, 20.04 and 22.04, NVIDIA GTX 1080, NVIDIA RTX A6000, NVIDIA GeForce RTX 3080, and NVIDIA GeForce RTX 3090.

If you find this codebase useful, consider citing:

@inproceedings{ha2023scalingup,
      title={Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition}, 
      author={Huy Ha and Pete Florence and Shuran Song},
      year={2023},
      eprint={2307.14535},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

If you have any questions, please contact me at huy [at] cs [dot] columbia [dot] edu.

Table of Contents

  • ⚙️Setup
  • 🚶Codebase Walkthrough
    • 💡 Core Concepts
      • 🪺 Nested Trajectories using Hierachical Actions and Policies
      • 🌳 Exploration Task Tree
      • 🌈 Seeded Variation
    • 🎛️ Control
    • 🏃‍♂️ Motion Planning
    • 🗣️ Language Model Queries
      • 💿 Cache
      • 🔗 Linking LLM Modules Together
      • 🪙 Coin flips
  • 🔬 Reproducing
    • 📊 Evaluation
    • 🗄️ Data Generation
    • 🧠 Training
  • 🔭 Extending
    • I want to add more
      • 🤖 Robots
      • 🪑 Assets
        • 📜 Tools & Scripts
      • 🌏 Environments & Tasks
        • New Simulators
        • New Tasks
      • 🦙 Language Models
    • 🖼️ Figure Utilities
      • Efficiency Plot
      • Visualizing Language-conditioned Outputs
    • Development Tips
      • 🐉 Hydra
      • 📷 Headless Rendering
      • 🖧 Multi-processing
      • 👩‍👦‍👶 Typing
      • ⏱️ Profiling
      • 💽 Data Format
      • 💾 RAM Usage
    • 💀 Known Issues
    • Training Tips
      • Mixed-Precision
  • 📽️ Visualizations

Acknowledgements

We would like to thank Cheng Chi, Zeyi Liu, Samir Yitzhak Gadre, Mengda Xu, Zhenjia Xu, Mandi Zhao and Dominik Bauer for their helpful feedback and fruitful discussions.

This work was supported in part by Google Research Award, NSF Award #2143601, and #2132519. We would like to thank Google for the UR5 robot hardware. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

Code