SceneSeg icon indicating copy to clipboard operation
SceneSeg copied to clipboard

How to extract Cast and Activity features?

Open albertaparicio opened this issue 4 years ago • 19 comments

Now that there are new models uploaded to Google Drive, I am trying to process a video with all 4 modes, but I do not see how can I extract the features for the Cast and Activity modes.

Could you give me any pointers on this?

Thank you

albertaparicio avatar Jul 14 '20 14:07 albertaparicio

Please check out here https://github.com/movienet/movienet-tools

And we will keep on improving it 🎉

Thanks for your interest, 😄

AnyiRao avatar Jul 18 '20 07:07 AnyiRao

Hi,

Thanks for this excellent project!

Just wondering where "cast_feat", "act_feat", and "aud_feat" are located in the Google drive?

Running "run.sh" had a lot of errors due to the missing files under these directories. For example:


FileNotFoundError: [Errno 2] No such file or directory: '../data/scene318/act_feat/tt1375666.pkl'


Thanks so much and have a good day!

miaoqiz avatar Aug 12 '20 21:08 miaoqiz

Hi miaoqiz

Thanks for your interests. The features are already uploaded and you may follow the guidance https://github.com/AnyiRao/SceneSeg/blob/master/docs/INSTALL.md#prepare-datasets-for-scene318 and use wget to download.

Best,

AnyiRao avatar Aug 14 '20 10:08 AnyiRao

Hi @AnyiRao How to extract cast_feat and act_feat using https://github.com/movienet/movienet-tools for a new video the code does not seem complete

xpngzhng avatar Sep 20 '20 03:09 xpngzhng

Hello xpngzhng,

You need to use the scripts/dist_infer.sh to start the cast_feat and act_feat extraction scripts.

You may refer to the following place feature extraction example.

# Place feat
bash scripts/dist_infer.sh scripts/extract_place_feat.py 8 --listfile ../data/meta/frame_240P.txt --img_prefix ./ --save_path ../data/place_feat.npy —imgs_per_gpu 256

Hi @AnyiRao How to extract cast_feat and act_feat using https://github.com/movienet/movienet-tools for a new video the code does not seem complete

AnyiRao avatar Sep 20 '20 13:09 AnyiRao

Hi AnyiRao

I want to use SceneSeg and movienet-tools to run a video clip scene segmentation using aud_feat, place_feat, cast_feat and act_feat, so I need to split shots first and then extract all the four kinds of features.

I can extract place_feat and aud_feat using SceneSeg, since the code is available, but I still need to extract cast_feat and act_feat. From movienet-tools I cannot find the code to extract cast_feat. As last common says, in movienet-tools, the extracted place feat of a movie is in one npy file, which is different from what SceneSeg requires.

Hello xpngzhng,

You need to use the scripts/dist_infer.sh to start the cast_feat and act_feat extraction scripts.

You may refer to the following place feature extraction example.

# Place feat
bash scripts/dist_infer.sh scripts/extract_place_feat.py 8 --listfile ../data/meta/frame_240P.txt --img_prefix ./ --save_path ../data/place_feat.npy —imgs_per_gpu 256

Hi @AnyiRao How to extract cast_feat and act_feat using https://github.com/movienet/movienet-tools for a new video the code does not seem complete

xpngzhng avatar Sep 21 '20 01:09 xpngzhng

Hello xpngzhng,

Thanks for your interest in the project. You may refer to http://docs.movienet.site/movie-toolbox/tools/extract_feature Or you could implement something like the following. Cast feature consistutie of the face feature and person body feature. The example of extract face feature is as follows. Person body feature extractor loads the ./model/resnet50_csm.pth

# init a face extractor
from movienet.tools import FaceExtractor
weight_ext = './model/irv1_vggface2.pth'
extractor = FaceExtractor(weight_ext, gpu=0)

# extractor face feature
feat = extractor.extract($IMG$) # need to specify the $IMG$

Best,

Hi AnyiRao

I want to use SceneSeg and movienet-tools to run a video clip scene segmentation using aud_feat, place_feat, cast_feat and act_feat, so I need to split shots first and then extract all the four kinds of features.

I can extract place_feat and aud_feat using SceneSeg, since the code is available, but I still need to extract cast_feat and act_feat. From movienet-tools I cannot find the code to extract cast_feat. As last common says, in movienet-tools, the extracted place feat of a movie is in one npy file, which is different from what SceneSeg requires.

Hello xpngzhng, You need to use the scripts/dist_infer.sh to start the cast_feat and act_feat extraction scripts. You may refer to the following place feature extraction example.

# Place feat
bash scripts/dist_infer.sh scripts/extract_place_feat.py 8 --listfile ../data/meta/frame_240P.txt --img_prefix ./ --save_path ../data/place_feat.npy —imgs_per_gpu 256

Hi @AnyiRao How to extract cast_feat and act_feat using https://github.com/movienet/movienet-tools for a new video the code does not seem complete

AnyiRao avatar Sep 21 '20 01:09 AnyiRao

Thank you for your quick response I will take a closer look at movienet-tools code and examine more detail about the pkl file required by SceneSeg

xpngzhng avatar Sep 21 '20 01:09 xpngzhng

Hi @AnyiRao Sorry to bother you again

It is not difficult to extract cast_feat by movienet-tools, at least I can organize a pkl file with the keys and feature dim the same as scene318 dataset's movie cast_feat

But I failed to produce similar act_feat. The act_feat of one movie in scene318 dataset is like: image each shot has a 512 length feature What I obtain from movienet-tools is like: image each shot may have a list of dict, each dict has a feat whose length is 2048, seems to be the length of fast rcnn roi feature

I hope I use the movienet-tools in a correct way. Is it possible to obtain 512-length action feature for each shot?

xpngzhng avatar Sep 21 '20 12:09 xpngzhng

Hi @xpngzhng

You have done a good job. What you did is correct.

After I discussed it with my collaborators, the feature size cannot match with the previous one, since we updated the backbone and the model to a better version. As we said, the project is an ongoing effort, we are iteratively making it better, and this causes the version mismatch.

And we also need to be clarified that the cast feature doesn't match with the previous one either. In the previous version, the cast feature (dim=512) is concatenated by face feature (dim=256) and body feature (dim=256). Now the dimension of the face feature is 512 and the body feature is 256. You may need to notice this.

The good news is that the release of videos goes to the final round (this Wednesday). You may extract the features from the videos then.

You are cordily to contact us via email if you have any further question as CVPR deadline is approaching. We will try our best to adapt to the purpose of your usage.

Best,

Hi @AnyiRao Sorry to bother you again

It is not difficult to extract cast_feat by movienet-tools, at least I can organize a pkl file with the keys and feature dim the same as scene318 dataset's movie cast_feat

But I failed to produce similar act_feat. The act_feat of one movie in scene318 dataset is like: image each shot has a 512 length feature What I obtain from movienet-tools is like: image each shot may have a list of dict, each dict has a feat whose length is 2048, seems to be the length of fast rcnn roi feature

I hope I use the movienet-tools in a correct way. Is it possible to obtain 512-length action feature for each shot?

AnyiRao avatar Sep 21 '20 13:09 AnyiRao

Hi @AnyiRao Many thanks to you and your team

xpngzhng avatar Sep 22 '20 01:09 xpngzhng

Hi,

Can you kindly advise how to understand the prediction output from "python run.py ../run/xxx/xxx.py"

For example:


demo 0020 1 1 demo 0021 1 1 demo 0022 1 1


What does each column present?

Thanks so much and have a good day!

miaoqiz avatar Sep 22 '20 20:09 miaoqiz

Hi @miaoqiz

The function to write the output is as follows, https://github.com/AnyiRao/SceneSeg/blob/master/lgss/utilis/dataset_utilis.py#L160

And you could also find out that the template is videoid/imdbid shotid groundtruth prediction. To be exact, demo 0020 1 1 means that both the gt and pred of the boundary of 0020 and 0021 is a scene boundary.

Best,

Hi,

Can you kindly advise how to understand the prediction output from "python run.py ../run/xxx/xxx.py"

For example:

demo 0020 1 1 demo 0021 1 1 demo 0022 1 1

What does each column present?

Thanks so much and have a good day!

AnyiRao avatar Sep 23 '20 00:09 AnyiRao

Thanks so much! @AnyiRao

How to find the timecode / frame range of a predicted shot? for example, how to find the specifics of "0020"?

Thanks!

miaoqiz avatar Sep 23 '20 00:09 miaoqiz

Hi @miaoqiz

If you follow my file naming rule, shot_txt has the information about the shot and frame correspondence to recover the time of each scene. You may also refer here for scene318 https://github.com/AnyiRao/SceneSeg/blob/master/docs/INSTALL.md#explanation

Best,

Thanks so much! @AnyiRao

How to find the timecode / frame range of a predicted shot? for example, how to find the specifics of "0020"?

Thanks!

AnyiRao avatar Sep 23 '20 01:09 AnyiRao

Hi @AnyiRao ,

Thanks you for sharing your code-base.

What model did you used for face feature extraction ? I am trying to replicate your results, however as you mentioned the face feature extractor from movienet-tools has a 512 length output whereas yours is 256.

jblemoine avatar Jan 13 '21 17:01 jblemoine

Hi @AnyiRao ,

Thanks you for your released code, and when will you release raw videos? If raw videos can not be released recently, would you like to release the model which you used to extract cast_feat?

Thank you!

Eniac-Xie avatar Mar 12 '21 08:03 Eniac-Xie

Hi @AnyiRao

Appreciated for sharing the codebase of LGSS framework. According to your reply above, the raw videos should have been released on 23 Sept 2020. May I know how to obtain the raw videos?

Thank you very much !

The good news is that the release of videos goes to the final round (this Wednesday). You may extract the features from the videos then.

onnkeat avatar Apr 21 '21 20:04 onnkeat

Hi @AnyiRao

Many congratulations to you on the awesome work. Going through this thread, I realize that that the models for face and action feature extraction have changed. Due to this I am getting size mismatch errors for the same. I need to extract these features for a custom dataset that my team has created for scene boundary detection and also compare it with your approach.

I would be grateful if you can provide the models and scrips for face and action feature extraction that you have used. I have successfully modified the movie-net tools library to be compatible with our dataset but I am facing issues when I run the scene segmentation model (all.py config) due to the size mismatch.

ra1995 avatar Jun 07 '21 19:06 ra1995