tacotron Is tacotron inference real-time?

Is tacotron inference real-time?

Open msobhan69 opened this issue 8 years ago • 4 comments

I trained model with one sample. The sample results from eval.py are completely noisy but they are recognizable human speech. A sample result generation take the following times (Input: 80 time step(4s)):

encoding_decoding: 23.3s spectrogram2wav(): 0.96s

this delay is more than real-time. Isn't tacotron inference real-time?

May 28 '17 07:05 msobhan69

I don't know, honestly. Does the original paper mention anything about it?

Jun 07 '17 10:06 Kyubyong

Dear @Kyubyong , the paper just said, "since Tacotron generates speech at the frame level, it’s substantially faster than sample-level autoregressive methods", and nothing more.

Jun 08 '17 03:06 msobhan69

@msobhan69 Thanks. I believe what the paper said is true, but I don't know if it means Tacotron can generate samples real-time.

Jun 08 '17 04:06 Kyubyong

I trained model with one sample. The sample results from eval.py are completely noisy but they are recognizable human speech. A sample result generation take the following times (Input: 80 time step(4s)):

encoding_decoding: 23.3s spectrogram2wav(): 0.96s

this delay is more than real-time. Isn't tacotron inference real-time?

@msobhan69 This is the time it takes to transform text to voice once trained? In my case it is taking 2 minutes to generate the voice, how can I reduce this time? Thanks.

May 21 '19 23:05 edwargl7

tacotron tacotron copied to clipboard

Is tacotron inference real-time?

tacotron
tacotron copied to clipboard