parler-tts icon indicating copy to clipboard operation
parler-tts copied to clipboard

Numbering pronounce

Open kunci115 opened this issue 5 months ago • 0 comments

I have tried to put some digits on spaces https://huggingface.co/spaces/parler-tts/parler_tts, for example text:Hey, how are you doing today? i lived in 36th street it working fine.

but from this readme github to do the sample code bellow

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")

prompt = "Hey, how are you doing today? i lived in 36th street"
description = "Laura's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

above code is resulting difference output with broken number pronounce, what's the difference ? is the model different ?

kunci115 avatar Sep 19 '24 07:09 kunci115