Single-Stage-Multi-person-Pose-Machines
Single-Stage-Multi-person-Pose-Machines copied to clipboard
A tensorlfow implementation about arxiv paper "Single-Stage Multi-Person Pose Machines" (SPM)
Unofficial implementation of arxiv paper "Single-Stage Multi-Person Pose Machines", detail imformation can see this paper or check this csdn link only for reference.
TODO
- [x] ~~custom distribute training is not work well, I trained for 10 epochs and nothing can be learned at all. So if anyone is familiar with this, please help me to check it and make it work.~~
The custom distribute training is right, but I write
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
is different with inspm_model.py
. Because I writecheckpoint = tf.train.Checkpoint(optimizer=optimizer, net=model)
inspm_model.py
, the parameter intf.train.Checkpoint
is different, one isnet=model
and another ismodel=model
. So, if I use checkpoints saved bycheckpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
, it is impossible usingcheckpoint = tf.train.Checkpoint(optimizer=optimizer, net=model)
to restore it. So, we must keep parameters intf.train.Checkpoint
as same as possible. By the way, it's a good way addcheckpoint.restore(ckpt_path).assert_existing_objects_matched()
to find restore error as soon as possible. - [x] using tf.keras to run distribute training
- [ ] add coco eval while training
Requirement
- tensorflow 2.0.0
- python 3.6
- cuda 10
- imgaug == 0.3.0
- pycocotools
About Dataset
we use the first 12 points of ai-challenger format, which can found in this website. Maybe disabled, MSCOCO dataset is ok too, but need to delete five points on head and change its format just like ai-challenger. Note that we still use pycocotools to load data, so if you use ai-challenger, you need to translate its annos file format into coco annos format. here is a convert code just for reference.
Network Structure
In this repo, just use hrnet as for its body network, you can replace this body with any other network as you like. Please check for here: nets/spm_model.py
Single Gpu Training
python3 main.py
All config can be found in config/center_config.py
Multi-GPU Training
python3 distribute/custom_train.py
~~Note that if you have four gpus and its ids is [0, 1, 2, 3], and you want to use gpu id [2, 3] is not work very well for now. You can only use gpu id [0, 1] or [0, 1, 2] will work fine. I didn't know why and wish someone can tell me.~~
The reason why we set os.environ['CUDA_VISIBLE_DEVICES'] = '2, 3'
but can not use gpu_ids = [2, 3]
is that tensorflow has already make gpu 2/3 on machine re-declear to 0/1. So, if we want to use gpu_ids = [2, 3]
, just write:
os.environ['CUDA_VISIBLE_DEVICES'] = '2, 3'
gpu_ids = [0, 1]
devices = ['/device:GPU:{}'.format(i) for i in gpu_ids]
strategy = tf.distribute.MirroredStrategy(devices=devices)
in using distribute training.
Test on images
python3 tools/spm_model_test.py
Eval
create predicts json file
python3 tools/model_val.py
eval
python3 tools/ai_format_kps_eval.py --ref true_label.json --submit predict.json
detailed information can be found here
About loss
In spm_loss
function, you need carefully to set value of two different kinds of losses in order to make them balanced in numerical.
ai_formate joints:
- right_shoulder
- right_elbow
- right_wrist
- left_shoulder
- left_elbow
- left_wrist
- right_hip
- right_knee
- right_ankle
- left_hip
- left_knee
- left_ankle