PSL
PSL copied to clipboard
Source code for ICASSP2022 "Pseudo Strong labels for large scale weakly supervised audio tagging"
Pseudo Strong Labels
This repository contains the source code for our ICASSP2022 paper Pseudo strong labels for large scale weakly supervised audio tagging.
Highlights:
- State-of-the-art on the balanced Audioset subset.
- Simple MobileNetV2 model, don't need expensive GPU to run.
- Quick training, since only 60h of balanced Audioset is required.
- Achieves an mAP of 35.48 (more or less), useable for most real-world applications.
The aim of this work is to show that by adding automatic supervision on a fixed scale from a machine annotator (or teacher) to a student model, performance gains can be observed on Audioset.
Specifically, our method outperforms other approaches in literature on the balanced
subset of Audioset, while using a rather simple MobileNetV2 architecture.
Method | Label | mAP | $d'$ |
---|---|---|---|
Baseline (Weak) | Weak | 17.69 | 1.994 |
PSL-10s (Proposed) | PSL-10s | 31.13 | 2.454 |
PSL-5s (Proposed) | PSL-5s | 34.11 | 2.549 |
PSL-2s (Proposed) | PSL-2s | 35.48 | 2.588 |
----------------------------------- | ---------------------------------------- | ------------- | ------------- |
CNN14 [@Kong2020d] | Weak | 27.80 | 1.850 |
EfficientNet-B0 [@gong2021psla] | Weak | 33.50 | - |
EfficientNet-B2 [@gong2021psla] | Weak | 34.06 | - |
ResNet-50 [@gong2021psla] | Weak | 31.80 | - |
AST [@gong21b_interspeech] | Weak | 34.70 | - |
Requirements
Binary package requirements
gnu-parallel for the preprocessing, which can be installed using conda:
conda install parallel
If you have root rights you can:
# On Arch distros
sudo pacman -S parallel
# On Debian
sudo apt install parallel
Further, the download script in scripts/1_download_audioset.sh
uses Proxychains to download the data. You might want to disable proxychains by simply removing the line or configure your own proxychains proxy.
Python requirements
This script has been tested using python=3.8
on a Centos 5 and Manjaro.
To install the python dependencies just run:
python3 -m pip install -r requirements.txt
Training preparation
The structure of this repo is as follows:
.
├── configs
├── data
│ ├── audio
│ │ ├── balanced
│ │ └── eval
│ ├── csvs
│ └── logs
├── figures
├── scripts
│ └── utils
[Optional] Preparation without downloading the dataset
If already have downloaded audioset, please put the raw data of the balanced and eval subsets in data/audio/balanced
and data/audio/eval
respectively.
Then put balanced_train_segments.csv
, eval_segments.csv
and class_labels_indices.csv
into data/csvs
.
1. Download Data
Firstly, you need the balanced and evaluation subsets of audioset. These can be downloaded using the following script:
./scripts/1_download_audioset.sh
2. Prepare HDF5
In order to speed up IO, we pack the data into hdf5 files. This can be done by:
./scripts/2_prepare_data.sh
Usage
For the experiments in Table 2, run:
## For the 10s PSL training
./train_psl.sh configs/psl_balanced_chunk_10sec.yaml
## For the 5s PSL training
./train_psl.sh configs/psl_balanced_chunk_5sec.yaml
## For the 2s PSL training
./train_psl.sh configs/psl_balanced_chunk_2sec.yaml
For the experiments in Table 3, run:
## For the 10s PSL training
./train_psl.sh configs/teacher_student_chunk_10sec.yaml
## For the 5s PSL training
./train_psl.sh configs/teacher_student_chunk_5sec.yaml
## For the 2s PSL training
./train_psl.sh configs/teacher_student_chunk_2sec.yaml
Note that this repo can be easily extended to run the experiments in Table 4, i.e., using the full Audioset dataset.