TemporalMaxer
TemporalMaxer copied to clipboard
TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization
TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization
This is the official implementation of the paper TemporalMaxer.
We release the training and testing code for THUMOS, EPIC-Kitchen 100 (verb, noun), and MultiTHUMOS datasets.
Operating Systems and Specs
- Ubuntu 18.04.5 LTS
- NVIDIA RTX A6000
- NVIDIA-SMI 520.61.05 - Driver Version: 520.61.05 - CUDA Version: 11.8
Table of Contents
- Introduction
- Installation
- Usage
- Citation
Introduction
Recent studies have emphasized the importance of applying long-term temporal context modeling (TCM) blocks to the extracted video clip features such as employing complex self-attention mechanisms. In this paper, we present the simplest method ever to address this task and argue that the extracted video clip features are already informative to achieve outstanding performance without sophisticated architectures. To this end, we introduce TemporalMaxer, which minimizes long-term temporal context modeling while maximizing information from the extracted video clip features with a basic, parameter-free, and local region operating max-pooling block. Picking out only the most critical information for adjacent and local clip embeddings, this block results in a more efficient TAL model. We demonstrate that TemporalMaxer outperforms other state-of-the-art methods that utilize long-term TCM such as self-attention on various TAL datasets while requiring significantly fewer parameters and computational resources.

Installation
a. Install packages
conda create -n TemporalMaxer python=3.9
conda activate TemporalMaxer
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
python -m pip install -r requirements.txt
pip install -e ./
b. Build NMS
Part of NMS is implemented in C++. The code can be compiled by
cd ./libs/utils; python setup.py install; cd ../..
The code should be recompiled every time you update PyTorch.
Usage
Reproduce Our Results on EPIC-Kitchens 100
Download Features and Annotations
- Download epic_kitchens.tar.gz from this link, md5sum
6cbf312eb5025c0abbbf5d0eaa61e556
. - Make
data
folder in the current code directory. - The data folder structure should look like the following:
# This folder
├── configs
│ ├── temporalmaxer_epic_slowfast_noun.yaml
│ └── temporalmaxer_epic_slowfast_verb.yaml
│ └── ........
├── data
│ ├── epic_kitchens
│ ├── annotations
│ └── features
├── eval.py
├── figures
├── libs
........
- Train and test EPIC-Kitchens 100
VERB
# training
./scripts/epic_verb/train.sh
# testing
./scripts/epic_verb/test.sh
- Train and test EPIC-Kitchens 100
NOUN
# training
./scripts/epic_noun/train.sh
# testing
./scripts/epic_noun/test.sh
- The results should be:
Method | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | Avg |
---|---|---|---|---|---|---|
TemporalMaxer (verb) | 27.8 | 26.6 | 25.3 | 23.1 | 19.9 | 24.5 |
TemporalMaxer (noun) | 26.3 | 25.2 | 23.5 | 21.3 | 17.6 | 22.8 |
Reproduce Our Results on THUMOS
Download Features and Annotations
- Download thumos.tar.gz from this link, md5sum
1f71c37dba55d549e4b02841d0dcf603
. - Make
data
folder in the current code directory. - The data folder structure should look like the following:
# This folder
├── configs
│ └── ........
│ └── temporalmaxer_thumos_i3d.yaml
├── data
│ ├── thumos
│ ├── annotations
│ └── i3d_features
├── eval.py
├── figures
├── libs
........
- Train and test THUMOS
# training
./scripts/thumos/train.sh
# testing
./scripts/thumos/test.sh
- The results should be:
Method | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Avg |
---|---|---|---|---|---|---|
TemporalMaxer | 82.8 | 78.9 | 71.8 | 60.5 | 44.7 | 67.7 |
Reproduce Our Results on MultiTHUMOS
Download Features and Annotations
- Download multithumos.tar.gz from this link, md5sum
5b2477678abd612440b6099e349442ad
. - Make
data
folder in the current code directory. - The data folder structure should look like the following:
# This folder
├── configs
│ └── ........
│ └── temporalmaxer_multithumos_i3d.yaml
│ └── ........
├── data
│ ├── thumos
│ │ ├── annotations
│ │ └── i3d_features
│ ├── multithumos
│ ├── ........
│ └── multithumos.json
│ └── ........
├── eval.py
├── figures
├── libs
........
- Train and test MultiTHUMOS
# training
./scripts/multithumos/train.sh
# testing
./scripts/multithumos/test.sh
- The results should be:
Method | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | Avg |
---|---|---|---|---|---|---|---|---|---|---|
TemporalMaxer | 49.1 | 47.5 | 44.3 | 39.4 | 33.4 | 26.5 | 17.4 | 9.1 | 2.24 | 29.9 |
Citation
Please cite the paper in your publications if it helps your research:
@article{tang2023temporalmaxer,
title={TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization},
author={Tang, Tuan N and Kim, Kwonyoung and Sohn, Kwanghoon},
journal={arXiv preprint arXiv:2303.09055},
year={2023}
}