PaddleVideo How to generate the features(such as VGGish or ResNet) used by MultimodalVideoTag? This part seems not to be implemented in the repo

Thanks for such a great work! When I run the code , problems occur: how to generate the features(such as VGGish or ResNet) used by MultimodalVideoTag? This part seems not to be implemented in the repo and thus the customized dataset could not be runned correctly.Is there any example to show how to run the repo completly

May 17 '22 10:05 aiot-tech

可以参考FootballAction的特征提取部分：

https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/FootballAction#step14--基于pp-tsm的视频特征提取

May 17 '22 12:05 huangjun12

Thank you for your reply! I find that the ckpt of the PPTSM model may be trained with the Football Dataset(May be I am wrong:-); Could I reuse it to extract the img feat from other datasets? If not, will PaddlePaddle provide pretrained weights for resnet and vggish? Another thing is that the dim of the vggish used in the PPTSM project doesnt meet the need of the MultimodalVideoTag.

May 18 '22 10:05 aiot-tech

and could you tell me which architecture and layer(such as resnet50 or resnet101, avgpool layer or layer4) of resnet and vggish do you use to extract features?

May 18 '22 12:05 aiot-tech

Are there any pipelines that include extracting basic feats to provide?

May 23 '22 09:05 aiot-tech