hand-pose-estimation
hand-pose-estimation copied to clipboard
Estimate 2D Hand Pose from RGB image by top-down method using Stacked Hourglass Network and Hand Detector module - SSD (single-shot detector)
Hand Pose Estimation
Introduction
This is a project we built for the Hand Pose Estimation problem. In this project, we tested the Stacked Hourglass Network model (a fairly well-known model used for Human Pose Estimation). In addition, we switched from the usual bottom-up method to the top-down by adding a hand-detect module. Here is the architecture model we use:
Prepare the environment
-
python==3.8.16
-
Install PyTorch-cuda==11.7 following official instruction:
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
-
Install the necessary dependencies by running:
pip install -r requirements.txt
Prepare the dataset
Please organize your datasets for training and testing following this structure:
Main-folder/
│
├── data/
│ ├── FreiHAND_pub_v2 - This folder contains data for training model
| | ├── ...
| |
│ └── FreiHAND_pub_v2_eval - public test images
| ├── ...
|
└── ...
- Put the downloaded FreiHAND dataset in ./data/
Link: https://lmb.informatik.uni-freiburg.de/data/freihand/FreiHAND_pub_v2.zip
- Put the downloaded FreiHAND evaluation set in ./data/
Link: https://lmb.informatik.uni-freiburg.de/data/freihand/FreiHAND_pub_v2_eval.zip
Running the code
Training
In this project, we focus on training Stacked Hourglass Network. As for the hand detect module, we'd like to use the victordibia's pretrained_model (SSD) without further modification. Train the hourglass network:
python 1.train.py --config-file "configs/train_FreiHAND_dataset.yaml"
The trained model weights (net_hm.pth) will be at Main-folder/. Copy and paste the trained model into ./model/trained_models before evaluate.
Evaluation
Evaluate on FreiHAND dataset:
python 2.evaluate_FreiHAND.py --config-file "configs/eval_FreiHAND_dataset.yaml"
The visualization results will be saved to ./output/
Real-time hand pose estimation
Prepare a camera with and clear angle, good light, and less noisy space. Run the following command line:
python 3.real_time_2D_hand_pose_estimation.py --config-file "configs/eval_webcam.yaml"
Note: Our model only solves the one-handed recognition problem. If there are 2 or more hands, the model will randomly select one hand to predict. To predict multiple hands, please edit the file 3.real_time_2D_hand_pose_estimation.py (because of resource and time limitations, we don't do this part).
Addition
To fine-tune the hyperparameters (BATCH_SIZE, NUM_WORKERS, DATA_SIZE, ...), you can edit the .yaml files in the ./configs/ directory.
Acknowledgment
The repo is developed based on victordibia and enghock1. Thanks for your contribution.