Shinji Watanabe comments

Results 318 comments of


                                            Shinji Watanabe

Very poor performance when I used Speech2Text

The second audio still looks too long. Can you try to chunk the audio in less than 15 seconds?

Very poor performance when I used Speech2Text

Thanks for your report. The result looks reasonable to me. The background noise and volume issues should be fixed by re-training a model with matched conditions.

When I try to use a self-supervised model as the front end, The logs file shows that some samples got nan grad. These were ignored for CTC loss

Which config file are you using? @simpleoier, can you answer it for me?

Initial commit of heroico recipe.

LGTM! Once you updated the result, and upload the model, I'll merge it.

Initial commit of heroico recipe.

Thanks. WER seems to be broken. Can you check what happens?

Initial commit of heroico recipe.

> Yes, I'll look into it. Regarding the corpus, we (West Point/ARL) own it. THE LDC told us that we have the right to post it to openslr.org. Sounds good....

Initial commit of heroico recipe.

@johnjosephmorgan, is there any progress? We'll happy to help your PR. If you have some issues you could post it or even you can directly email me ([email protected]).

cudnn ctc loss deterministic mode probably is not working

Thanks for your report. I found that the first forward is deterministic while the first backward and later computations are not deterministic in some cases, probably due to this issue.

Marathi Openslr female speaker ASR

@simpleoier, there are two PRs for the same database. Both of them passed the CI check. Just FYI. @shubhamphal, I'll give you more concrete instructions on how to merge your...

Espnet multimodal PR

@chorongi, it seems that your change in the Makefile (`tools/Makefile`) causes the issue, especially for the numpy related. Please check it. Also, move the multimodal tool installation to the extra...