Shinji Watanabe

Results 318 comments of Shinji Watanabe

I want to minimize the change comes from this PR. So, my suggestion is to keep `make_pad_mask` as it is and call new `make_pad_mask` as `make_pad_mask_without_reference` or `make_pad_mask_onnx` or whatever....

Can you share a model link and results in README.md?

I just added @Emrys365 for this thread. Most likely, something happened in the training stage since "The grad norm is nan." I'm expecting that 1. the optimization parameters are wrong...

@freddy5566 and @simpleoier, is it finished? If so, @simpleoier, you can make it from draft to regular PR and merge it after the CI check.

Can you only limit it to asr1 in this PR? It is too many changes if we include asr2.

I think it is not straightforward. You have to read both codes and understand the interface, which may take more than reading the training process document.

Good suggestion' @slSeanWU, can you take a look at it to see whether we can solve this issue (also whether we can support large-v3)?

@Takaaki-Saeki and @vebmaylrie, I'm not very sure about how to get the single-speaker partition. Can you give me more information? FYI, https://github.com/sarulab-speech/jtubespeech?tab=readme-ov-file#step5-asv-speaker-variation-scoring seems to be incomplete.