UnsupervisedVideoSummarization
UnsupervisedVideoSummarization copied to clipboard
Source code for the paper "Unsupervised Video Summarization via Multi-source Features" published at ICMR 2021
Unsupervised Video Summarization via Multi-source Features
This is the official GitHub page for the paper:
Hussain Kanafani, Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth. 2021. Unsupervised Video Summarization via Multi-source Features. In Proceedings of the 2021 International Conference on MultimediaRetrieval (ICMR ’21), August 21–24, 2021, Taipei, Taiwan. ACM, New York, NY, USA, https://doi.org/10.1145/3460426.3463597
The paper is available on:
- arXiv: https://arxiv.org/pdf/2105.12532.pdf
Model architecture: Multi-Source Chunk and Stride Fusion (MCSF)
Get started (Requirements and Setup)
python 3.6
cd MCSF
conda create -n mcsf python=3.6
conda activate mcsf
pip install -r requirements.txt
Project Structure
Directory:
- /data
- /plc_365 (places features for summe and tvsum)
- /splits (original and non-overlapping splits)
- /SumMe (processed dataset h5)
- /TVSum (processed dataset h5)
- /csnet (implementation of csnet method)
- /mcsf-places365-early-fusion
- /mcsf-places365-late-fusion
- /mcsf-places365-intermediate-fusion
- /src/evaluation (evaluation using F1-score)
- /src/visualization
- /sum-ind (implementation of SUM-Ind method)
Datasets
Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by Ke Zhang and [Wei-Lun Chao] and the h5 files were obtained from Kaiyang Zhou.
Download
wget https://zenodo.org/record/4884870/files/datasets.tar
Files Structure
The implemented models use the provided h5 files which have the following structure:
/key
/features 2D-array with shape (n_steps, feature-dimension)
/gtscore 1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
/user_summary 2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
/change_points 2D-array with shape (num_segments, 2), each row stores indices of a segment
/n_frame_per_seg 1D-array with shape (num_segments), indicates number of frames in each segment
/n_frames number of frames in original video
/picks positions of subsampled frames in original video
/n_steps number of subsampled frames
/gtsummary 1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
/video_name (optional) original video name, only available for SumMe dataset
Original videos and annotations for each dataset are also available in the authors' project webpages:
TVSum dataset: https://github.com/yalesong/tvsum
SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark
MCSF Variations and CSNet
We used SUM-GAN method as a starting point for the implementation.
How to train
Run main.py file with the configurations specified in configs.py to train the model. In config.py you find argument parameters for training:
Parameter | type | default |
---|---|---|
mode | string possible values (train, test) | train |
verbose | boolean | true |
video_type | string (summe or tvsum) | summe |
input_size | int | 1024 |
hidden_size | int | 500 |
split_index | int | 0 |
n_epochs | int | 20 |
m | int (number of divisions used for chunk and stride network) | 4 |
For training the model using a single split, run:
python main.py --split_index N (with N being the index of the split)
How to evaluate
Using multiple human-generated summaries per video: To evaluate CSNET and all other MCSF models by comparing, after each training epoch, the generated summary for each test video against a set of reference human summaries that are available for that video (see the '/user_summary' entry in the explanation of the h5 file structure in the Data section above), run the 'src/evalution/evaluate.py' script after specifying which config file to use: 'config_summe.yaml' or 'config_tvsum.yaml'
SUM-Ind
Train and test codes are written in main.py
. To see the detailed arguments, please do python main.py -h
.
How to train
python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --verbose
How to test
python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results
Citation
@article{kanafani2021MCSF,
title={Unsupervised Video Summarization via Multi-source Features},
author={Kanafani, Hussain and Ghauri, Junaid Ahmed and Hakimov, Sherzod and Ewerth, Ralph},
Conference={ACM International Conference on Multimedia Retrieval (ICMR)},
year={2021}
}