Christian Schäfer

Results 72 comments of Christian Schäfer

Hi, we unfortunately do not have the (time) resources currently to write papers, the closest would be FastSpeech, which the current implementation is based on: https://arxiv.org/abs/1905.09263 We did some non-scientific...

We will do a comparison soon, hopefully. The latencies in the FastSpeech paper are measured with a batch size of 1, which is an unrealistic setting for production systems, so...

Hi, for torchscript it is necessary to do some changes to the model. I will investigate this if i have some spare time.

Hi, I tested some fine-tuning using a multispeaker model. Honestly, results were a bit mixed, I would mostly recommend just training on a single corpus for best quality. If data...

Hi, the current state is that I implemented a jit-compatible model here: https://github.com/as-ideas/ForwardTacotron/blob/experiments/enable_jit/models/forward_tacotron.py It is taking a bit experimenting though until I decide to merge it to master, feel free...

Hi, good news, the jit export is implemented now: https://github.com/as-ideas/ForwardTacotron#export-model-with-torchscript

Hi, you could use M-AILABS https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ for a male speaker. The format is pretty similar. Multispeaker is only possible with a branch (multispeaker). Other languages are not a problem, check...

Good luck, I know that the branch is pretty behind but should give an idea about how to do it. Unfortunately I don't have time currently to work on multispeaker.

Hi, I am currently experimenting with it. So far, it seems to improve the pauses predicted by the model, which are often gone with the standard implementation. Also, it is...

Hi, interesting idea - would this be applicable to mel spectra? As far as I understand its more of a metric to compare the final audio wav files, probably more...