Shinji Watanabe
Shinji Watanabe
I want to minimize the change comes from this PR. So, my suggestion is to keep `make_pad_mask` as it is and call new `make_pad_mask` as `make_pad_mask_without_reference` or `make_pad_mask_onnx` or whatever....
Can you share a model link and results in README.md?
I just added @Emrys365 for this thread. Most likely, something happened in the training stage since "The grad norm is nan." I'm expecting that 1. the optimization parameters are wrong...
@freddy5566 and @simpleoier, is it finished? If so, @simpleoier, you can make it from draft to regular PR and merge it after the CI check.
Can you only limit it to asr1 in this PR? It is too many changes if we include asr2.
I think it is not straightforward. You have to read both codes and understand the interface, which may take more than reading the training process document.
You can resume your training by specifying `--max_epoch n`
Good suggestion' @slSeanWU, can you take a look at it to see whether we can solve this issue (also whether we can support large-v3)?
Can you add more information? Which file does this happen? Does it happen to all files? This file parses the lof file. So, the log file format might be changed....
@Takaaki-Saeki and @vebmaylrie, I'm not very sure about how to get the single-speaker partition. Can you give me more information? FYI, https://github.com/sarulab-speech/jtubespeech?tab=readme-ov-file#step5-asv-speaker-variation-scoring seems to be incomplete.