Codec-SUPERB
Codec-SUPERB copied to clipboard
Results for APCodec
16 kHz 2kbps
parameter size:
encoder (including quantizer) : 29MB decoder: 40MB
exps/results.txt
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 74.93%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 3.02%
Stage 3: Run automatic speech recognition. WER: 4.74%
Stage 4: Run audio event classification. ACC: 55.25%
src/codec_metrics/exps/results.txt
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -2.618520825954788
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.64869004
Stage 3: Run STOI. stoi: mean score is: 0.717766808809779
Stage 4: Run PESQ. pesq: mean score is: 1.5509950947761535
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -9.309950038168095
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 2.002597
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 2.68255129531442
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.87451327
Stage 3: Run STOI. stoi: mean score is: 0.8740794643709145
Stage 4: Run PESQ. pesq: mean score is: 2.1911674320697783
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -6.6539098549604345
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.8435475
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -3.0264018525811536
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4431057
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -1.3850498167169416
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.85650134
Stage 3: Run STOI. stoi: mean score is: 0.8534544293908012
Stage 4: Run PESQ. pesq: mean score is: 1.5768725705146789
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 2.5759249020219706
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8179612
Stage 3: Run STOI. stoi: mean score is: 0.8975456227011622
Stage 4: Run PESQ. pesq: mean score is: 2.2901515591144563
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -1.3464429268284184
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9656775
Stage 3: Run STOI. stoi: mean score is: 0.7968180258305204
Stage 4: Run PESQ. pesq: mean score is: 1.7317036986351013
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 4.364046016689939
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8910932
Stage 3: Run STOI. stoi: mean score is: 0.9133034388476792
Stage 4: Run PESQ. pesq: mean score is: 2.245469583272934
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 1.5015711204024194
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78175646
Stage 3: Run STOI. stoi: mean score is: 0.8577775240334691
Stage 4: Run PESQ. pesq: mean score is: 2.120602227449417
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -0.22438148479388495
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6584927
Stage 3: Run STOI. stoi: mean score is: 0.8339094697226409
Stage 4: Run PESQ. pesq: mean score is: 1.8127213382720948
Average SDR for speech datasets: 0.6937122850168396 Average Mel_Loss for speech datasets: 0.8118357137500001 Average STOI for speech datasets: 0.8430818479633708 Average PESQ for speech datasets: 1.9399604380130768 Average SDR for audio datasets: -6.330087248569893 Average Mel_Loss for audio datasets: 1.7630834000000002