conv-tasnet
conv-tasnet copied to clipboard
A PyTorch implementation of "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" (see recipes in aps framework https://github.com/funcwj/aps)
ConvTasNet
A PyTorch implementation of the TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation
Requirements
see requirements.txt
Usage
-
training: configure conf.py and run train.sh
-
inference
./nnet/separate.py /path/to/checkpoint --input /path/to/mix.scp --gpu 0 > separate.log 2>&1 &
- evaluate
./nnet/compute_si_snr.py /path/to/ref_spk1.scp,/path/to/ref_spk2.scp /path/to/inf_spk1.scp,/path/to/inf_spk2.scp
Result (on best configuratures in the paper)
| ID | Settings | Causal | Norm | Param | Loss | Si-SDR |
|---|---|---|---|---|---|---|
| 0 | adam/lr:1e-3/wd:1e-5/32-batch/2gpu | N | BN/relu | 8.75M | -17.59/-15.45 | 14.63 |
| 1 | adam/lr:1e-2/wd:1e-5/20-batch/2gpu | N | gLN/relu | - | -16.09/-15.21 | 14.58 |
| 2 | adam/lr:1e-3/wd:1e-5/20-batch/2gpu | N | gLN/relu | - | -17.91/-16.54 | 15.87 |
| 3 | adam/lr:1e-2/wd:1e-5/32-batch/2gpu | N | BN/sigmoid | - | -14.51/-13.40 | 12.62 |
| 4 | adam/lr:1e-2/wd:1e-5/32-batch/2gpu | N | BN/relu | - | -17.20/-15.38 | 14.58 |
| 5 | adam/lr:1e-3/wd:1e-5/20-batch/2gpu | N | gLN/sigmoid | - | -17.20/-16.11 | 15.55 |
| 6 | adam/lr:1e-3/wd:1e-5/32-batch/2gpu | Y | BN/relu | - | -15.25/-12.47 | 11.42 |
| 7 | adam/lr:1e-3/wd:1e-5/24-batch/2gpu | N | cLN/relu | - | -18.72/-16.17 | 15.25 |
Reference
Luo Y, Mesgarani N. TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation[J]. arXiv preprint arXiv:1809.07454, 2018.