2D-3D_Multitask_Deep_Learning icon indicating copy to clipboard operation
2D-3D_Multitask_Deep_Learning copied to clipboard

implement "2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning"

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

Clone from https://github.com/dluvizon/deephar
Add some features

Add dalaloader, train code with merl dataset and coco

Coco dataset

For pose estimation. Use pycocotools to get image and label pose Image had been crop and resize to size (256,256).
Pose have 16 key points,

Merl dataset

1. Merl for pose estimation

Data from pkl with each element have form :

{   'img_paths': '17_2_crop_1077_1108_RetractFromShelf-0004.jpg', 
    'img_width': 920,
    'img_height': 680, 
    'image_id': 1,
    'bbox': [602.7314290727887, 324.3657157897949, 218.69714235578266, 119.07430627005441],
    'num_keypoints': 24,
    'keypoints': [[0.0, 0.0, 2], [747.5, 366.0285714285714, 1], [0.0, 0.0, 2], [741.4201450892857, 385.51311383928567, 1],\
               [0.0,0.0, 2], [704.9828591482981, 418.7314265659877, 1], [782.9857142857143, 393.3,0],\
                [632.4342917306083, 353.47713884626114, 1], [0.0, 0.0, 2], [690.2628631591797, 350.8485674176897, 1],\
                [764.5200060163226, 340.00571027483255, 1], [0.0, 0.0,2], [0.0, 0.0, 2], [0.0, 0.0, 2], [0.0, 0.0, 2],\
               [615.348577444894, 386.66285313197545, 1], [0.0, 0.0, 2]]}

With keypoint is list of point , each point is array with form [ x,y,confident_score ]

2. Merl for action recognition

Combine pose and visual feature for action recognition. Data load from json file with bbox of each frame.

Data from json file.

Each element have form

{
"action": "LookatShelf", 
"keypoints": [], 
"id": "LookatShelf_33_1_crop_3243_3374_InspectProduct", 
"image": 
    {
    "url": "/mnt/hdd10tb/Users/andang/actions/video/LookatShelf/33_1_crop_3243_3374_InspectProduct/2.jpg", 
    "file_name": "2.jpg", 
    "width": 920, "height": 680
    }, 
"person_bbox": [418, 336, 559, 499]
}

Run code

Train coco pose estimation

 CUDA_VISIBLE_DEVICES=1 python exp/coco/train_coco_singleperson.py
--batch-size 16 --epochs 10

Train merl pose estimation

 CUDA_VISIBLE_DEVICES=1 python exp/merl/train_merl_singleperson.py
--batch-size 16 --epochs 10

Train merl action recognition

CUDA_VISIBLE_DEVICES=2 python exp/merl/train_merl_video.py 
--num-frames 4 --anno-path /mnt/hdd10tb/Users/andang/actions/train_2.json 
--val-anno-path /mnt/hdd10tb/Users/andang/actions/test_2.json

Model

action

File reception.py

Pose estimation model

  • input is list of image, shape = (height,width,channel)
  • ouput predict pose estimation

file action.py have :

  • input is list of image, shape = (num_frames,height,width,channel)
  • ouput predict action in onehot encode

File action_2D.py like file action.py , delete some code, just run for 2D pose

File action_pose.py model predict from output of pose model

  • input
    • y : pose corrdinate follow time distributed shape = (1, num_frames,
    • num_joints, 2 (x,y - num coordinates))
    • p : probability visible each point shape = (1, num_frames, numjoints, 1)
    • hs : heat map shape = (1, num_frames, 32, 32, num_joints)
    • xb1: visual feature output from Stem model shape = (1, num_frames, 32, 32, 576)
  • output predict action

Validation

CUDA_VISIBLE_DEVICES=1 python exp/merl/val_merl_video.py

Citing

Please cite our paper if this software (or any part of it) or weights are useful for you.

@InProceedings{Luvizon_2018_CVPR,
  author = {Luvizon, Diogo C. and Picard, David and Tabia, Hedi},
  title = {2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2018}
}

License

MIT License