mmaction2 icon indicating copy to clipboard operation
mmaction2 copied to clipboard

Multi Person and Multi Action for Training

Open Zepharchit opened this issue 2 years ago • 2 comments

Hi, I am currently experimenting with action recognition task. I am utilizing STGCN based network for action recognition in videos. I wanted to extend the experiment by recoginising multiple actions in a single videos/frame. I mean that there are multiple people with different actions in a frame/video. I had looked up at a similar issue in the following link https://github.com/open-mmlab/mmskeleton/issues/109 Requesting clarification on few points for multi action recognition:-

  1. For an input of shape (N,C,T,V,M) where N is the batch size, C-channels, T- temporal size, V-Joints,M-person. How to extend it for multiple people. In the link it's mentioned to permute the shape to (NM,C,T,V,1), NM is it simple multiplication or is there a different approach to permute the shape of the ndarray?
  2. For the multi-action detection in a single video how is the data being prepared for different labels in the video.
  3. If possible, that a model has been trained on Kinectics-400 for multi action recognition in a video , requesting to please share the model weights.

Zepharchit avatar Apr 05 '23 06:04 Zepharchit

@Zepharchit Current skeleton based action recognition models assume that there is only one action in a video/frame, same as the action recognition. what your want to implement is more like spatio-temporal action detection, which would perform detection at first.

  1. Split the input of all persons with shape (N,C,T,V,M) to M inputs of each person with shape (N, C, T, V, 1),and do M times inferences. Please note that the precondition is that you already have the id information of each person in frame, otherwise the skeleton sequence would be entangled. you can get separate sequences by detection and tracking.
  2. As explained in the previous answer
  3. Currently we don't provide K400 pretrained skeleton checkpoint, you can refer to pyskl. But as far as i know, it is trained as single action recognition.

cir7 avatar Apr 11 '23 09:04 cir7

@Zepharchit May I ask if you have implemented it? Can you teach me. Thank you very much.

SCsleepy avatar Apr 16 '24 02:04 SCsleepy