Learning Efficient Representations for Keyword Spotting with Triplet Loss

Code for the paper Learning Efficient Representations for Keyword Spotting with Triplet Loss
by Roman Vygon([email protected]) and Nikolay Mikhaylovskiy([email protected]).

Prerequisites

Training

To train a triplet encoder run:

python TripletEncoder.py --name=test_encoder --manifest=MANIFEST --model=MODEL

To train a no-triplet model, or to train a classifier based on the triplet encoder run:

python TripletClassifier.py --name=test_classifier --manifest=MANIFEST --model=MODEL

You can use --help to view the description of arguments.

Hardware Requirements

Training was performed on a single Tesla K80 12GB.

Model	Batch Size	VRAM
Res15	35*4	11GB
Res8	35*10	4GB

Testing

To test a triplet encoder run:

python infer_train.py --name=test_encoder --manifest=MANIFEST --model=MODEL --enc_step=ENCODER_TRAINING_STEP

To test a classifier-head model run:

python infer_notl.py --name=test_encoder --cl_name=test_classifier --manifest=MANIFEST --model=MODEL --enc_step=ENCODER_TRAINING_STEP --cl_step=CLASSIFIER_TRAINING_STEP

You can use --help to view the description of arguments.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Datasets

LibriSpeech

You can download the test-clean-360 here: http://www.openslr.org/12. If the site doesn't load see this code for direct links to the files.

Google Speech Commands

Use this notebook to download and prepare the Google Speech Commands dataset.

Additional files

~~Data manifests, librispeech alignments and distance measures can be found here. You'll need to update the manifests.json file with the dataset path. You can convert LibriWords manifests with convert_path_prefix.ipynb~~

The files sadly went missing, I'll try to recover them, if anyone had a chance to download them please contact me.

triplet_loss_kws
triplet_loss_kws copied to clipboard

Metadata

Learning Efficient Representations for Keyword Spotting with Triplet Loss

Prerequisites

Training

Hardware Requirements

Testing

License

Datasets

LibriSpeech

Google Speech Commands

Additional files

← Metadata

Owner

Metadata

triplet_loss_kws triplet_loss_kws copied to clipboard

Metadata

Learning Efficient Representations for Keyword Spotting with Triplet Loss

Prerequisites

Training

Hardware Requirements

Testing

License

Datasets

LibriSpeech

Google Speech Commands

Additional files

← Metadata

Owner

Metadata

triplet_loss_kws
triplet_loss_kws copied to clipboard