HI-SLAM2
HI-SLAM2 copied to clipboard
HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction
-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction
Wei Zhang · Qing Cheng · David Skuddis · Niclas Zeller · Daniel Cremers · Norbert Haala
Paper | Project Page
HI-SLAM2 constructs a 3DGS map (a) from monocular input, achieving accurate mesh reconstructions (b) and high-quality renderings (c). It surpasses existing monocular SLAM methods in both geometric accuracy and rendering quality while achieving faster runtime.
Table of Contents
- Getting Started
- Data Preparation
- Run Demo
- Run Evaluation
- Semantic Reconstruction
- Acknowledgement
- Citation
Update (12. May 2025)
We have made updates to the CUDA kernel in HI-SLAM2. As a result, it is necessary to recompile the kernel. Please run the following command after pulling the latest changes to ensure everything works properly:
python setup.py install
Getting Started
- Clone the repo with submodules
git clone --recursive https://github.com/Willyzw/HI-SLAM2
- Create a new Conda environment and then activate it. Please note that we use the PyTorch version compiled by CUDA 11.8 in the
environment.yamlfile.
conda env create -f environment.yaml
conda activate hislam2
- Compile the CUDA kernel extensions (takes about 10 minutes). Please note that this process assume you have CUDA 11 installed, not 12. To look into the installed CUDA version, you can run
nvcc --versionin the terminal.
python setup.py install
- Download the pretrained weights of Omnidata models for generating depth and normal priors
wget https://zenodo.org/records/10447888/files/omnidata_dpt_normal_v2.ckpt -P pretrained_models
wget https://zenodo.org/records/10447888/files/omnidata_dpt_depth_v2.ckpt -P pretrained_models
Data Preparation
Replica
Download and prepare the Replica dataset by running
bash scripts/download_replica.sh
python scripts/preprocess_replica.py
where the data is converted to the expected format and put to data/Replica folder.
ScanNet
Please follow the instructions in ScanNet to download the data and put the extracted color/pose/intrinsic from the .sens files to data/ScanNet folder as following:
[Folder structure (click to expand)]
scene0000_00
├── color
│ ├── 000000.jpg
│ └── ...
├── intrinsic
│ └── intrinsic_color.txt
└── pose
│ ├── 000000.txt
│ └── ...
Then run the following script to convert the data to the expected input format
python scripts/preprocess_scannet.py
We take the following sequences for evaluation: scene0000_00, scene0054_00, scene0059_00, scene0106_00, scene0169_00, scene0181_00, scene0207_00, scene0233_00.
Run Demo
After preparing the Replica dataset, you can run HI-SLAM2 for a demo. It takes about 2 minutes to run the demo on an Nvidia RTX 4090 GPU. The result will be saved in the outputs/room0 folder including the estimated camera poses, the Gaussian map, and the renderings. To visualize the constructing process of the Gaussian map, using the --gsvis flag. To visualize the intermediate results e.g. estimated depth and point cloud, using the --droidvis flag.
python demo.py \
--imagedir data/Replica/room0/colors \
--calib calib/replica.txt \
--config config/replica_config.yaml \
--output outputs/room0 \
[--gsvis] # Optional: Enable Gaussian map display
[--droidvis] # Optional: Enable point cloud display
To generate the TSDF mesh from the reconstructed Gaussian map, you can run
python tsdf_integrate.py --result outputs/room0 --voxel_size 0.01 --weight 2
Run Evaluation
Replica
Run the following script to automate the evaluation process on all sequences of the Replica dataset. It will evaluate the tracking error, rendering quality, and reconstruction accuracy.
python scripts/run_replica.py
ScanNet
Run the following script to automate the evaluation process on the selected 8 sequences of the ScanNet dataset. It will evaluate the tracking error and rendering quality.
python scripts/run_scannet.py
Run your own data
HI-SLAM2 supports casual video recordings from smartphone or camera (demo above with iPhone 15). To use your own video data, we provide a preprocessing script that extracts individual frames from your video and runs COLMAP to automatically estimate camera intrinsics. Run the preprocessing with:
python scripts/preprocess_owndata.py PATH_TO_YOUR_VIDEO PATH_TO_OUTPUT_DIR
once the intrinsics are obtained, you can run HI-SLAM2 by using the following command:
python demo.py \
--imagedir PATH_TO_OUTPUT_DIR/images \
--calib PATH_TO_OUTPUT_DIR/calib.txt \
--config config/owndata_config.yaml \
--output outputs/owndata \
--undistort --droidvis --gsvis
there are some other command line arguments you can use:
--undistortundistort the image if distortion parameters are provided in the calib file--droidvisvisualize the point cloud map and the intermediate results--gsvisvisualize the Gaussian map--buffermax number of keyframes to pre-allocate memory for (default: 10% of total frames). Increase this if you encounter the error:IndexError: index X is out of bounds for dimension 0 with size X.--startstart frame index (default: from the first frame)--lengthnumber of frames to process (default: all frames)
Semantic Reconstruction
For semantic reconstruction capabilities, please check the Semantic branch. This branch extends HI-SLAM2 with additional features for semantic understanding and reconstruction.
Acknowledgement
We build this project based on DROID-SLAM, MonoGS, RaDe-GS and 3DGS. The reconstruction evaluation is based on evaluate_3d_reconstruction_lib. We thank the authors for their great works and hope this open-source code can be useful for your research.
Citation
Our paper is available on arXiv. If you find this code useful in your research, please cite our paper.
@article{zhang2024hi2,
title={HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction},
author={Zhang, Wei and Cheng, Qing and Skuddis, David and Zeller, Niclas and Cremers, Daniel and Haala, Norbert},
journal={arXiv preprint arXiv:2411.17982},
year={2024}
}