diffhoi
diffhoi copied to clipboard
Official Reimplementation of Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips (DiffHOI, ICCV23) https://judyye.github.io/diffhoi-www/
Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips, ICCV23
Quick start
- Installation
# pytorch <= 1.10 to be compatible with FrankMocap
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
# detectron2
python -m pip install detectron2 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
# pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install -r requirements.txt
-
Download pretrained diffusion model from here and place it at
${environment.output}
. Download our reconstructed HOI4D from here. Download preprocessed HOI4D sequences from here -
Path specification: specify your output folder at
configs/environment/learn.yaml
(Even better practice is to create your own filemy_own.yaml
and appendenvironment=my_own
to the command in terminal)${environment.output}/ # pretrained diffusion model release/ ddpm2d/ checkpoints/ config.yaml # Our test-time optimization results release_reproduce/ Mug_1/ ckpts/ config.yaml Mug_2/ ... # preprocessed data ${envionment.output}/../ HOI4D/ Mug_1/ cameras_hoi_smooth_100.npz image/ mocap/ .... Mug_2/ ...
Run on preprocessed HOI4D
Visualize Reconstruction
Suppose the models are under ${environment.output}/release_reproduce/
. The following command will render all models that matches ${load_folder}*
and save the rendering to ${load_folder}/SOME_MODEL/vis_clips
.
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python -m tools.vis_clips -m \
load_folder=release_reproduce/ fig=True
Note the slash in load_folder since the search pattern is ${load_folder}*
.
Replace fig=True
to video=True
to render HOIs in videos. More visualization options are at configs/eval.yaml
and tools/vis_clips.py
.
Test-Time Optimization (~8 hours)
- For example, optimize a sequence under
${environment.output}/../HOI4D/Kettle_1
. The default values are specified inconfigs/volsdf_nogt.yaml
.
CUDA_VISIBLE_DEVICES=1 PYTHONPATH=. python -m train -m \
expname=dev/\${data.index} \
data.cat=Kettle data.ind=1 \
environment=my_own # see above about path specification
- Parameter sweep
CUDA_VISIBLE_DEVICES=1 PYTHONPATH=. python -m train -m \
expname=dev/\${data.index} \
data.cat=Mug,Bottle,Kettle,Bowl,Knife,ToyCar data.ind=1,2 \
environment=my_own # see above about path specification
hydra/launcher=slurm environment=grogu_judy # if you want to use slurm
Run on custom data
Preprocessing
- Extract per-frame masks, hand boxes from videos by this repo.
- Reconstruct hand poses; convert the extracted masks and poses to the data format ready to be consumed by DiffHOI.
python -m preprocess.inspect_custom --seq bottle_1,bottle_2 --out_dir save_path --inp_dir output_from_step1
Test-Time Optimization (~8 hours)
The following command reconstruct preprocessed custom sequences under ${environment.output}/../${data.name}/${data_index}
.
CUDA_VISIBLE_DEVICES=6 PYTHONPATH=. python -m train -m \
expname=in_the_wild/\${data.name}\${data.index} \
data=custom data.name='WILD_CROP' \
data.index=bottle_1,bottle_2 \
The command above assume your sequences are under structure:
${environment.output}/../
1st_nocrop/
bottle_1/
image/
...
bottle_2/
...
Acknowledgement
- This project is built upon this amazing repo.
- We would also thank other great open-source projects:
- FrankMocap (for hand pose esitmation)
- STCN (for video object segmentation)
- SMPL/SMPLX, MANO
- GLIDE and modification, Guided Diffusion (for diffusion model)
- Pytorch3D (for rendering)
- pytorch-lightning (for framework)