moment_detr Question about the video encoder ViT

Question about the video encoder ViT

Open Summer-seu opened this issue 1 year ago • 1 comments

Hi，thanks for your great works! I have a question that how you fuse the image features from a 2-seconds clip into a clip video feature, since ViT is a feature extraction model for images not videos.

Sep 16 '23 10:09 Summer-seu

We sample a video frame (an image) every 2 seconds and extract embedding for it.

Oct 04 '23 18:10 jayleicn

moment_detr moment_detr copied to clipboard

Question about the video encoder ViT

moment_detr
moment_detr copied to clipboard