Christian Schäfer
Christian Schäfer
Hi, we unfortunately do not have the (time) resources currently to write papers, the closest would be FastSpeech, which the current implementation is based on: https://arxiv.org/abs/1905.09263 We did some non-scientific...
We will do a comparison soon, hopefully. The latencies in the FastSpeech paper are measured with a batch size of 1, which is an unrealistic setting for production systems, so...
Hi, for torchscript it is necessary to do some changes to the model. I will investigate this if i have some spare time.
Hi, I tested some fine-tuning using a multispeaker model. Honestly, results were a bit mixed, I would mostly recommend just training on a single corpus for best quality. If data...
Hi, the current state is that I implemented a jit-compatible model here: https://github.com/as-ideas/ForwardTacotron/blob/experiments/enable_jit/models/forward_tacotron.py It is taking a bit experimenting though until I decide to merge it to master, feel free...
Hi, good news, the jit export is implemented now: https://github.com/as-ideas/ForwardTacotron#export-model-with-torchscript
Hi, you could use M-AILABS https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ for a male speaker. The format is pretty similar. Multispeaker is only possible with a branch (multispeaker). Other languages are not a problem, check...
Good luck, I know that the branch is pretty behind but should give an idea about how to do it. Unfortunately I don't have time currently to work on multispeaker.
Hi, I am currently experimenting with it. So far, it seems to improve the pauses predicted by the model, which are often gone with the standard implementation. Also, it is...
Hi, interesting idea - would this be applicable to mel spectra? As far as I understand its more of a metric to compare the final audio wav files, probably more...