Shinji Watanabe comments

Results 318 comments of


                                            Shinji Watanabe

Add Native onnx export support to encoders and CTC

I want to minimize the change comes from this PR. So, my suggestion is to keep `make_pad_mask` as it is and call new `make_pad_mask` as `make_pad_mask_without_reference` or `make_pad_mask_onnx` or whatever....

Add unified S3PRL adapter interface

Can you share a model link and results in README.md?

when I run egs2/librimix/tse1/run.sh, the loss=0.000e+00 all the time

I just added @Emrys365 for this thread. Most likely, something happened in the training stage since "The grad norm is nan." I'm expecting that 1. the optimization parameters are wrong...

Creating AISHELL/asr2 recipe

@freddy5566 and @simpleoier, is it finished? If so, @simpleoier, you can make it from draft to regular PR and merge it after the CI check.

[WIP] Text Injection ASR

Can you only limit it to asr1 in this PR? It is too many changes if we include asr2.

Using VITS fast training

I think it is not straightforward. You have to read both codes and understand the interface, which may take more than reading the training process document.

How to continue training from (x+1)th epoch if the model is already trained upto xth epoch?

You can resume your training by specifying `--max_epoch n`

Any plan to switch to a newer version of Whisper

Good suggestion' @slSeanWU, can you take a look at it to see whether we can solve this issue (also whether we can support large-v3)?

Traceback (most recent call last): File "pyscripts/utils/calculate_rtf.py", line 113, in <module> main() File "pyscripts/utils/calculate_rtf.py", line 90, in main assert len(audio_durations) == len(end_times), ( AssertionError: (0, 151) # Accounting: time=0 threads=1 # Ended (code 1) at Wed Jan 10 02:02:22 IST 2024, elapsed time 0 seconds

Can you add more information? Which file does this happen? Does it happen to all files? This file parses the lof file. So, the log file format might be changed....

Add some documents for JTubeSpeech

@Takaaki-Saeki and @vebmaylrie, I'm not very sure about how to get the single-speaker partition. Can you give me more information? FYI, https://github.com/sarulab-speech/jtubespeech?tab=readme-ov-file#step5-asv-speaker-variation-scoring seems to be incomplete.