pyskl
pyskl copied to clipboard
PoseConv3D pseudo heatmap volume
Thanks for your promising work (PoseConv3D). i am wandering about the reason of stacking pseudo heatmap of joints along temporal dimension instead single image containing all joints for example if we have video clip of length 30 frame, the stacking heatmaps will produce matrix of 17 (num_joints) X 30 X 56 (height) X 56 (width) but putting all joints in single frame produce a matrix of 30X56X56.
For the later choice, you will lose the semantic information about each point: how can you tell the difference of two joints if they are just two points in the same channel?