CoVGT Missing sparsely video feature extraction module

To get inference on a custom video dataset we need to sparsely extract video features, the same way as you do to get a good result. That would be great if you can make the module accessible on the repo.

Oct 23 '23 17:10 kimia-cvengineer

Hi, thanks for the interest. I have uploaded the related code (for reference only). To extract region feature, you need to sample frames in the same way and use the tool provided by BUTD.

Oct 24 '23 08:10 doc-doc

Thank you very much for providing them. It would also be good if you could add some documentation to the files and functions so that we can better understand the starting point and steps to follow in order to extract feature properly.

Oct 24 '23 23:10 kimia-cvengineer

Bascially, you can follow a coarse pipeline: extract_video.py (decode mp4 into frames)->preprocess_feature.py (sample and encode frames into CNN representations)->split_dataset_feat.py(split the feature into train/val/test).

Oct 25 '23 04:10 doc-doc

That's so helpful. Thanks for explaining it.

Oct 25 '23 18:10 kimia-cvengineer

Which mode of 'cafe 'or 'd2' did you use to extract regional features?

Oct 26 '23 21:10 kimia-cvengineer

Please choose resnet-101 with d2.

Nov 17 '23 05:11 doc-doc

@doc-doc It seems that object_align.py does not give a complete method to obtain the bounding box, but directly reads region_8c10b_{}.h5. Is there any complete code that can detect the bounding box and then write it to region_8c10b_{}.h5?

Mar 28 '24 06:03 Khadgar123