howto100m icon indicating copy to clipboard operation
howto100m copied to clipboard

how to aggregate nx2048 features into one 2048 feature ?

Open dixonhsiao opened this issue 5 years ago • 1 comments

It seems that in your training/eval data there is only one 2048 2d feature and one 2048 3d feature for a sentence. But using the feature extractor in https://github.com/antoine77340/video_feature_extractor , it seems that there will be nx2048 features for a sentence (if the sentence is n seconds in duration for 2d, and approximately n/1.5 seconds for 3d). How do I aggregate nx2048 features into one 2048 feature as stated in your paper by using temporal max-pooling ? Just select the max value for each dimension ?

dixonhsiao avatar Sep 10 '19 07:09 dixonhsiao

Yes you can either max pool along the dimensions. For example, you could add nn.AdaptiveMaxPool2d((1, 2048)) after feature loading.

bjuncek avatar Dec 30 '19 13:12 bjuncek