action-modifiers icon indicating copy to clipboard operation
action-modifiers copied to clipboard

Code for the CVPR 2020 paper 'Action Modifiers: Learning from Adverbs in Instructional Videos'

Action Modifiers

Code and data for the CVPR 2020 paper 'Action Modifiers: Learning from Adverbs in Instructional Videos'.


Training/Test Splits

The files containing the adverb annotations can be found in train.csv and test.csv. The files contain the following columns:

Column Name Type Example Description
id int 955 Unique id for this adverb-action annotation
vid_id string S7wF6S5ywo4 YouTube id for the video the annotation is for
weak_timestamp float 19.435 Value in seconds of the action-adverb in the narration
clustered_adverb string quickly Annotated adverb
clustered_action string cut Annotated action
task_num int 105259 The id for the task in the HowTo100M dataset
adverb string fast The original adverb from the narration
action string slice The original action from the narration


The features can be downloaded here:

This contains two files per entry in train.csv or test.csv, one for RGB features, one for flow features.

Files are named <annotation_id>_<modality>.npz.


The videos can be downloaded using: python utils/ <train.csv|test.csv> <download_dir> --trim 20

The --trim 20 argument extracts 20 seconds around the weak timestamp as used to extract features.

Other useful files

antonym.csv lists each adverb and its antonym

adverb_clusters.csv lists the clusters of adverbs with the following columns:

Column Name Type Example Description
adverb_id int 0 ID of this adverb
cluster_key string coarsely Main adverb representing the cluster
adverbs list of strings ['coarsley', 'coarse', 'thickly', 'not finely', 'not fine'] Narrated adverbs in this cluster

action_clusters.csv is defined similarly



To train the model run:

python --feature-dir <path_to_directory_containing_features> --checkpoint-dir <path_to_save_checkpoints_to>

To train the model without first training the action embedding run

python --no-pretrain-action --temporal-agg <sdp|average|single> --feature_dir <path_to_directory_containing_features> --checkpoint-dir <path_to_save_checkpoints_to>


To test a model run:

python --laod <checkpoint_path> --temporal-agg <sdp|average|single> --feature-dir <path_to_features>


Models corresponding to results in the paper can be found under models/ they are:

  • full_model.ckpt - the final result in the paper
  • sdp.ckpt - the proposed model without the first stage of only training the action embedding
  • average.ckpt - action modifiers without the temporal attention
  • single.ckpt - action modifiers with only the second around the weak timestamp
  • action.ckpt - a pretrained action embedding with scaled dot-product attention without action modifiers

Subtitle Parsing

To parse subtitles for action-adverb pairs you first need to download the subtitles and punctuated texts. Alternatively you can punctuate your own subtitles with this tool

Then run:

python <path_to_subtitles> <path_to_punctuated texts> output.csv --adverb-file data/adverbs.csv --action-file data/actions.csv --task-list data/tasks.csv

--adverb-file, --action-file and --task-list are optional arguments use to filter the search space.