sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

TTS scaling pause not working?

Open juangon opened this issue 8 months ago • 5 comments

After seeing this merged PR (https://github.com/k2-fsa/sherpa-onnx/pull/1820) I thought pauses could be controlled. But seems not likely.

In that PR there is a new config parameter named "silence_scale", but it seems not valid, at least using Kokoro and vits piper TTS.

Additionally, length_scale is not for setting the pause but the overall speed.

Is this feature actually supported?

Thanks!

juangon avatar Mar 23 '25 09:03 juangon

Ok, so I added "silence_scale" without errors now, but pauses doesn't seem to change. Any advice?

juangon avatar Mar 23 '25 09:03 juangon

can you gvie a concrete.example and upload a generated wav?

csukuangfj avatar Mar 23 '25 10:03 csukuangfj

Sure, example using Vits Piper models, this for example (which have a very short pauses between sentences): "vits-piper-es_ES-sharvard-medium".

model_name = vits_viper_model[n:]
model = os.path.join(vits_piper_path, f"{model_name}.onnx")

tokens = os.path.join(vits_piper_path, "tokens.txt")

tts_config = sherpa_onnx.OfflineTtsConfig(
      model=sherpa_onnx.OfflineTtsModelConfig(
          vits=sherpa_onnx.OfflineTtsVitsModelConfig(
              model=model,
              lexicon="",
              data_dir=data_dir,
              tokens=tokens,
              length_scale=1.0 / speed,
          ),
          provider="cpu",
          debug=True,
          num_threads=2,
      ),
      silence_scale=0.6,
      #max_num_sentences=1,
  )
  tts = sherpa_onnx.OfflineTts(tts_config)

 text = "La tierra es lo que la historia ha dejado escrito en sus pueblos y paisajes, modelados por personajes que dejaron su huella a la trata sobre senderos y caminos. Extremadura es un cofre que guarda el silencio de quienes embarcaron para descubrir nuevos mundos y el vuelo pausado de las aves migratorias que han decidido convertir en suyos nuestros paisajes"

 audio = tts.generate(text, sid=0)
  
 sf.write(
        output_file,
        audio.samples,
        samplerate=audio.sample_rate,
        subtype="PCM_16",

Generated file attached. You can see pause in second 00:08 is minimal even knowing other phrase starts.

Thanks!

sherpa_onnxvits-piper-es_ES-sharvard-medium.zip

juangon avatar Mar 23 '25 10:03 juangon

I'm also having this issue, looking for a solution~

dong706 avatar Apr 07 '25 12:04 dong706

Is there a solution to this problem?

dong706 avatar Oct 18 '25 17:10 dong706