OpenFMNav
OpenFMNav copied to clipboard
Official implementation of OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
NAACL 2024 Findings
This is the official repository of OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models.
Setup
Dataset Preparation
Please follow HM3DSem to download the dataset and prepare the data. The data format should be:
data/
├── objectgoal_hm3d/
│ ├── train/
│ ├── val/
│ └── val_mini/
├── scene_datasets/
│ └── hm3d/
│ ├── minival/
│ └── val/
├── versioned_data/
├── matterport_category_mappings.tsv
└── object_norm_inv_perplexity.npy
Checkpoints
Please checkout Grounded-SAM to download groundingdino_swint_ogc.pth and sam_vit_h_4b8939.pth and put them into Grounded_SAM/.
Dependencies
-
Python & PyTorch
This code is tested on Python 3.9.16 on Ubuntu 20.04, with PyTorch 1.11.0+cu113.
-
Habitat-Sim & Habitat-Lab
# Habitat-Sim git clone https://github.com/facebookresearch/habitat-sim.git cd habitat-sim; git checkout tags/challenge-2022; pip install -r requirements.txt; python setup.py install --headless # Habitat-Lab git clone https://github.com/facebookresearch/habitat-lab.git cd habitat-lab; git checkout tags/challenge-2022; pip install -e . -
Grounded-SAM
Please checkout Grounded-SAM to install the dependencies.
-
Others
pip install -r requirements.txt
OpenAI API keys
You will need an OpenAI API key to use this repo. Please touch apikey.txt and paste your API key in the file.
Running
Example
An example command to run the pipeline:
CUDA_VISIBLE_DEVICES=0 python main.py --split val --eval 1 --auto_gpu_config 0 --prompt_type scoring \
-n 1 --num_eval_episodes 100 --text_threshold 0.55 --boundary_coeff 12 --start_episode 0 --tag_freq 100 \
--use_gtsem 0 --num_local_steps 20 --print_images 1 --exp_name test
Visualization
To make a demo video on your saved images, you can either use ffmpeg to make separate videos or use
python make_demo.py --exp_name test # add `--delete_img` to delete images after making video
to make batched videos.
Acknowledgements
This repo is heavily based on L3MVN. We thank the authors for their great work.
Citation
If you find this work helpful, please consider citing:
@article{kuang2024openfmnav,
title={OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models},
author={Kuang, Yuxuan and Lin, Hai and Jiang, Meng},
journal={arXiv preprint arXiv:2402.10670},
year={2024}
}