parler-tts
parler-tts copied to clipboard
Numbering pronounce
I have tried to put some digits on spaces https://huggingface.co/spaces/parler-tts/parler_tts, for example text:Hey, how are you doing today? i lived in 36th street it working fine.
but from this readme github to do the sample code bellow
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")
prompt = "Hey, how are you doing today? i lived in 36th street"
description = "Laura's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
above code is resulting difference output with broken number pronounce, what's the difference ? is the model different ?