FlexSED: Towards Open-Vocabulary Sound Event Detection

FlexSED is an easy-to-use, open-vocabulary sound event detection (SED) system. It can be used for data annotation, labeling, and developing evaluation metrics for audio generation.

News

Oct 2025: 📦 Released code and pretrained checkpoint
Sep 2025: 🎉 FlexSED Spotlighted at WASPAA 2025

Installation

Clone the repository:

git clone https://github.com/JHU-LCAP/FlexSED.git

Install the dependencies:

cd FlexSED
pip install -r requirements.txt

Usage

from api import FlexSED
import torch
import soundfile as sf

# load model
flexsed = FlexSED(device='cuda')

# run inference
events = ["Door", "Male Speech", "Laughter", "Dog"]
preds = flexsed.run_inference("example.wav", events)

# visualize prediciton
flexsed.to_multi_plot(preds, events, fname="example")

# (Optional) visualize prediciton by video
# flexsed.to_multi_video(preds, events, audio_path="example.wav", fname="example")

Training

Download the AudioSet-Strong subset. The dataset is available from both WavCaps and HF-AS-Strong. Thanks to the contributors for providing these resources.
Prepare metadata following the preprocessing steps. Feel free to check processed metadata.

(If you wish to create a validation split, remove a subset of samples from the training metadata and format them the same as the test metadata. Recommended: ~2000 samples across ~50 sound classes.)
Update file paths for both metadata and audio in src/configs.
Extract CLAP embeddings
```
python src/prepare_clap.py
```
Run training:
```
python src/train.py
```

Reference

If you find the code useful for your research, please consider citing:

@article{hai2025flexsed,
  title={FlexSED: Towards Open-Vocabulary Sound Event Detection},
  author={Hai, Jiarui and Wang, Helin and Guo, Weizhe and Elhilali, Mounya},
  journal={arXiv preprint arXiv:2509.18606},
  year={2025}
}

FlexSED
FlexSED copied to clipboard

Metadata

FlexSED: Towards Open-Vocabulary Sound Event Detection

News

Installation

Usage

Training

Reference

← Metadata

Owner

Metadata

FlexSED FlexSED copied to clipboard

Metadata

FlexSED: Towards Open-Vocabulary Sound Event Detection

News

Installation

Usage

Training

Reference

← Metadata

Owner

Metadata

FlexSED
FlexSED copied to clipboard