wenet icon indicating copy to clipboard operation
wenet copied to clipboard

blockformer

Open LeonWlw opened this issue 2 years ago • 3 comments

This PR is about implementation of blockformer in WeNet. (Original paper: https://arxiv.org/abs/2207.11697)

  • Implementation Details
    • add se layer ensemble conformer encoder outputs
    • add se layer ensemble transformer decoder outputs
    • using relative positional encoding in decoder

In main branch, extracting features by torchaudio make a little worse results than the paper. I will push a branch using kaldi features for aishell recipe which can reproduce results in the paper.

LeonWlw avatar Oct 19 '22 02:10 LeonWlw

I think it's better if we add the experiment results on AIShell-1 and LibriSpeech, to show that we can get consistent and solid gain by using the model.

robin1001 avatar Oct 19 '22 02:10 robin1001

I think it's better if we add the experiment results on AIShell-1 and LibriSpeech, to show that we can get consistent and solid gain by using the model.

@robin1001 results of aishell has been added

LeonWlw avatar Oct 19 '22 12:10 LeonWlw

hi, i run blockformer in 3080 and it just used 30% - 40% gpu. I change batchsize bigger and numworker but it didn't work. So what should i do to take more use of gpu?

903859154 avatar Dec 11 '22 09:12 903859154