sherpa-onnx
sherpa-onnx copied to clipboard
TTS scaling pause not working?
After seeing this merged PR (https://github.com/k2-fsa/sherpa-onnx/pull/1820) I thought pauses could be controlled. But seems not likely.
In that PR there is a new config parameter named "silence_scale", but it seems not valid, at least using Kokoro and vits piper TTS.
Additionally, length_scale is not for setting the pause but the overall speed.
Is this feature actually supported?
Thanks!
Ok, so I added "silence_scale" without errors now, but pauses doesn't seem to change. Any advice?
can you gvie a concrete.example and upload a generated wav?
Sure, example using Vits Piper models, this for example (which have a very short pauses between sentences): "vits-piper-es_ES-sharvard-medium".
model_name = vits_viper_model[n:]
model = os.path.join(vits_piper_path, f"{model_name}.onnx")
tokens = os.path.join(vits_piper_path, "tokens.txt")
tts_config = sherpa_onnx.OfflineTtsConfig(
model=sherpa_onnx.OfflineTtsModelConfig(
vits=sherpa_onnx.OfflineTtsVitsModelConfig(
model=model,
lexicon="",
data_dir=data_dir,
tokens=tokens,
length_scale=1.0 / speed,
),
provider="cpu",
debug=True,
num_threads=2,
),
silence_scale=0.6,
#max_num_sentences=1,
)
tts = sherpa_onnx.OfflineTts(tts_config)
text = "La tierra es lo que la historia ha dejado escrito en sus pueblos y paisajes, modelados por personajes que dejaron su huella a la trata sobre senderos y caminos. Extremadura es un cofre que guarda el silencio de quienes embarcaron para descubrir nuevos mundos y el vuelo pausado de las aves migratorias que han decidido convertir en suyos nuestros paisajes"
audio = tts.generate(text, sid=0)
sf.write(
output_file,
audio.samples,
samplerate=audio.sample_rate,
subtype="PCM_16",
Generated file attached. You can see pause in second 00:08 is minimal even knowing other phrase starts.
Thanks!
I'm also having this issue, looking for a solution~
Is there a solution to this problem?