Codec-SUPERB icon indicating copy to clipboard operation
Codec-SUPERB copied to clipboard

Result For SpeechTokenizer

Open RayYuki opened this issue 8 months ago • 4 comments

Here is the result for SpeechTokenizer.

The bit rate is 2kbps, following are the results:

Results in exps/results.txt

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 72.15%

Stage 2: Run speaker related evaluation. EER: 4.03%

Stage 3: Run automatic speech recognition. WER: 4.55%

Stage 4: Run audio event classification. ACC: 25.50%


Result in src/codec_metrics/exps/results.txt

Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -29.90983049070145

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 5.345735

Stage 3: Run STOI. stoi: mean score is: 0.06024890838574476

Stage 4: Run PESQ. pesq: mean score is: 1.586073912382126

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -22.282276880645814

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 3.4074209

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: 1.5112133717223253

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8877456

Stage 3: Run STOI. stoi: mean score is: 0.8648300690857609

Stage 4: Run PESQ. pesq: mean score is: 2.170962030887604

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -21.45771079855064

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 3.1137948

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -22.950851389668035

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 4.621136

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -3.846337947640395

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9027287

Stage 3: Run STOI. stoi: mean score is: 0.8309377170272262

Stage 4: Run PESQ. pesq: mean score is: 1.5058157062530517

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: 1.0211239468849096

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8223095

Stage 3: Run STOI. stoi: mean score is: 0.8872668136911973

Stage 4: Run PESQ. pesq: mean score is: 2.2581932806968688

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -1.774289102870904

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.153448

Stage 3: Run STOI. stoi: mean score is: 0.7758606059083771

Stage 4: Run PESQ. pesq: mean score is: 1.8245106658550223

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: 3.7615257663215895

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8986037

Stage 3: Run STOI. stoi: mean score is: 0.9141771654461831

Stage 4: Run PESQ. pesq: mean score is: 2.2321277034282683

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -27.182861328199774

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 5.430982

Stage 3: Run STOI. stoi: mean score is: 0.14532493265232807

Stage 4: Run PESQ. pesq: mean score is: 1.6926373445987701

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: -1.9323934995843512

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.823112

Stage 3: Run STOI. stoi: mean score is: 0.8241731080501418

Stage 4: Run PESQ. pesq: mean score is: 1.9483790636062621

Average SDR for speech datasets: -4.06314554192561 Average Mel_Loss for speech datasets: 1.5598471285714286 Average STOI for speech datasets: 0.7489386302658877 Average PESQ for speech datasets: 1.9475179707608352 Average SDR for audio datasets: -22.23027968958767 Average Mel_Loss for audio datasets: 3.714117233333333

RayYuki avatar Jun 18 '24 12:06 RayYuki