arctic-capgen-vid
arctic-capgen-vid copied to clipboard
Input data preparation
Thanks for the great code! We want to use the library to gets some results on our dataset. When we analyze the pkl files (from the given zip file), the no.of GoogLeNet features (1024 dim.) for each video are less than actual total no. of frames in the video. It seems there is some kind of sampling of frames or it could be from the HoG, HoF, MBH feature cube from the paper, but is unclear. Once the features are obtained, these are split into 26 equally spaced clips from which first frame is taken as input. Are any scripts also released for input pkl data preparation? Thanks.
"for each video are less than actual total no. of frames in the video", I think they are the same, unless you change the sampling rate during frame extraction with ffmpeg.
The video feature pkl is nothing but a python dictionary, and the other files are just building the dictionary of the captions. They should be easy to make. If you really need one, I could try to dig it out for you.
Thank you. It will be very helpful to look at the scripts for the pkl file creation, specially for the same pkl files (FEAT_key_vidID_value_features.pkl) that come in the zip file.
If I want to train the model on my own data, how should I prepare the C3D features of video. I didn't find corresponding files in your project. @yaoli
Hi,@sxs4337 . I have the same problem, each video are less than actual total no. of frames in the video, how did you solve it?
hi,@yaoli @sxs4337 ,whether have you dealed with your problem? could you give me a script for the pkl file creation? think you very much!
I was not able to clarify this so ended up using this repo- https://github.com/sxs4337/SA-tensorflow and have added a few data preparation scripts. Hope it helps.