Codec-SUPERB
Codec-SUPERB copied to clipboard
Result For SpeechTokenizer
Here is the result for SpeechTokenizer.
The bit rate is 2kbps, following are the results:
Results in exps/results.txt
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 72.15%
Stage 2: Run speaker related evaluation. EER: 4.03%
Stage 3: Run automatic speech recognition. WER: 4.55%
Stage 4: Run audio event classification. ACC: 25.50%
Result in src/codec_metrics/exps/results.txt
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -29.90983049070145
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 5.345735
Stage 3: Run STOI. stoi: mean score is: 0.06024890838574476
Stage 4: Run PESQ. pesq: mean score is: 1.586073912382126
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -22.282276880645814
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 3.4074209
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 1.5112133717223253
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8877456
Stage 3: Run STOI. stoi: mean score is: 0.8648300690857609
Stage 4: Run PESQ. pesq: mean score is: 2.170962030887604
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -21.45771079855064
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 3.1137948
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -22.950851389668035
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 4.621136
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -3.846337947640395
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9027287
Stage 3: Run STOI. stoi: mean score is: 0.8309377170272262
Stage 4: Run PESQ. pesq: mean score is: 1.5058157062530517
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 1.0211239468849096
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8223095
Stage 3: Run STOI. stoi: mean score is: 0.8872668136911973
Stage 4: Run PESQ. pesq: mean score is: 2.2581932806968688
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -1.774289102870904
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.153448
Stage 3: Run STOI. stoi: mean score is: 0.7758606059083771
Stage 4: Run PESQ. pesq: mean score is: 1.8245106658550223
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 3.7615257663215895
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8986037
Stage 3: Run STOI. stoi: mean score is: 0.9141771654461831
Stage 4: Run PESQ. pesq: mean score is: 2.2321277034282683
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -27.182861328199774
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 5.430982
Stage 3: Run STOI. stoi: mean score is: 0.14532493265232807
Stage 4: Run PESQ. pesq: mean score is: 1.6926373445987701
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -1.9323934995843512
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.823112
Stage 3: Run STOI. stoi: mean score is: 0.8241731080501418
Stage 4: Run PESQ. pesq: mean score is: 1.9483790636062621
Average SDR for speech datasets: -4.06314554192561 Average Mel_Loss for speech datasets: 1.5598471285714286 Average STOI for speech datasets: 0.7489386302658877 Average PESQ for speech datasets: 1.9475179707608352 Average SDR for audio datasets: -22.23027968958767 Average Mel_Loss for audio datasets: 3.714117233333333