3D-Scene-Diffusion-Guidance-using-Scene-Graphs
3D-Scene-Diffusion-Guidance-using-Scene-Graphs copied to clipboard
The implementation for "3D Scene Diffusion Guidance using Scene Graphs" paper. A Diffusion Model for Conditional 3D Scene Generation with Classifier-Free Guidance on Scene Graphs
3D Scene Diffusion Guidance using Scene Graphs
This repository contains the source code for the paper 3D Scene Diffusion Guidance using Scene Graphs.
Abstract
Guided synthesis of high-quality 3D scenes is a challenging task. Diffusion models have shown promise in generating diverse data, including 3D scenes. However, current methods rely directly on text embeddings for controlling the generation, limiting the incorporation of complex spatial relationships between objects. We propose a novel approach for 3D scene diffusion guidance using scene graphs. To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network. We show that our approach significantly improves the alignment between scene description and generated scene.
Table of Contents
- Setup
- Usage
- Results
- Acknowledgement
- Citation
Setup
To set up all the necessary dependencies, you can use Conda. Open your terminal and execute the following command:
conda env create -f environment.yml
Usage
guided-diffusion/inference.ipynb
provides a script for generating scenes using the trained denoising network.
guided-diffusion/main.ipynb
provides a training script for the denoising network.
[Optional] FastText Embeddings by Facebook Research
We use the FastText embeddings by Facebook Research to embed the scene objects descriptions in a more robust way than a standard Word2Vec encoder. By using it, we can embed textual description of each object in the scene into a 300-dimensional embedding stored in a node. Combining these nodes with their inter-node relations we generate a scene graph used as input to the denoising network.
You can download the model binary here and place them in the models
folder.
[Optional] DVIS Library for 3D Scene Visualization
guided-diffusion/inference.ipynb
contains a code section on DVIS library usage to visualize the generated scenes in 3D.
Results
Generated Scenes
Below is a 4x4 table providing an overview of 3D scene synthesis results. Each row depicts a single result: first column displays a natural language description of a scene, second column shows a corresponding scene graph used as input for the generative process. The remaining two columns depict the synthesized 3D scenes both from the side and top views. The selected results display generative results for (1) very complex, (2) disconnected, (3) repetitive, and (4) simple scene graphs.
Denoising Process
The following GIF demonstrates the denoising process applied to a single scene:
Acknowledgement
This work is developed with TUM Visual Computing Group led by Prof. Matthias Niessner. It builds upon DiffuScene and we thank Yinyu Nie for his great support and supervision.
Citation
If you find this work useful please cite:
@misc{naanaa20233d,
title={3D Scene Diffusion Guidance using Scene Graphs},
author={Mohammad Naanaa and Katharina Schmid and Yinyu Nie},
year={2023},
eprint={2308.04468},
archivePrefix={arXiv},
primaryClass={cs.CV}
}