CoVGT icon indicating copy to clipboard operation
CoVGT copied to clipboard

Missing sparsely video feature extraction module

Open kimia-cvengineer opened this issue 2 years ago • 7 comments

To get inference on a custom video dataset we need to sparsely extract video features, the same way as you do to get a good result. That would be great if you can make the module accessible on the repo.

kimia-cvengineer avatar Oct 23 '23 17:10 kimia-cvengineer

Hi, thanks for the interest. I have uploaded the related code (for reference only). To extract region feature, you need to sample frames in the same way and use the tool provided by BUTD.

doc-doc avatar Oct 24 '23 08:10 doc-doc

Thank you very much for providing them. It would also be good if you could add some documentation to the files and functions so that we can better understand the starting point and steps to follow in order to extract feature properly.

kimia-cvengineer avatar Oct 24 '23 23:10 kimia-cvengineer

Bascially, you can follow a coarse pipeline: extract_video.py (decode mp4 into frames)->preprocess_feature.py (sample and encode frames into CNN representations)->split_dataset_feat.py(split the feature into train/val/test).

doc-doc avatar Oct 25 '23 04:10 doc-doc

That's so helpful. Thanks for explaining it.

kimia-cvengineer avatar Oct 25 '23 18:10 kimia-cvengineer

Which mode of 'cafe 'or 'd2' did you use to extract regional features?

kimia-cvengineer avatar Oct 26 '23 21:10 kimia-cvengineer

Please choose resnet-101 with d2.

doc-doc avatar Nov 17 '23 05:11 doc-doc

@doc-doc It seems that object_align.py does not give a complete method to obtain the bounding box, but directly reads region_8c10b_{}.h5. Is there any complete code that can detect the bounding box and then write it to region_8c10b_{}.h5?

Khadgar123 avatar Mar 28 '24 06:03 Khadgar123