PaddleVideo icon indicating copy to clipboard operation
PaddleVideo copied to clipboard

How to generate the features(such as VGGish or ResNet) used by MultimodalVideoTag? This part seems not to be implemented in the repo

Open aiot-tech opened this issue 3 years ago • 4 comments

Thanks for such a great work! When I run the code , problems occur: how to generate the features(such as VGGish or ResNet) used by MultimodalVideoTag? This part seems not to be implemented in the repo and thus the customized dataset could not be runned correctly.Is there any example to show how to run the repo completly

aiot-tech avatar May 17 '22 10:05 aiot-tech

可以参考FootballAction的特征提取部分:

https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/FootballAction#step14--基于pp-tsm的视频特征提取

huangjun12 avatar May 17 '22 12:05 huangjun12

Thank you for your reply! I find that the ckpt of the PPTSM model may be trained with the Football Dataset(May be I am wrong:-); Could I reuse it to extract the img feat from other datasets? If not, will PaddlePaddle provide pretrained weights for resnet and vggish? Another thing is that the dim of the vggish used in the PPTSM project doesnt meet the need of the MultimodalVideoTag.

aiot-tech avatar May 18 '22 10:05 aiot-tech

and could you tell me which architecture and layer(such as resnet50 or resnet101, avgpool layer or layer4) of resnet and vggish do you use to extract features?

aiot-tech avatar May 18 '22 12:05 aiot-tech

Are there any pipelines that include extracting basic feats to provide?

aiot-tech avatar May 23 '22 09:05 aiot-tech