parler-tts [show and tell] apple mps support

[show and tell] apple mps support

Open bghira opened this issue 4 months ago • 7 comments

with newer pytorch (2.4 nightly) we get bfloat16 support in MPS.

i tested this:

from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
import torch

device = "mps:0"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device=device, dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1")

prompt = "welcome to huggingface"
description = "An old man."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device=device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device=device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.to(torch.float32).cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

Apr 10 '24 19:04 bghira

That's awesome, thanks for sharing @bghira! How fast was inference on your local machine?

Apr 11 '24 11:04 sanchit-gandhi

it gets slower as the sample size increases but this test script takes about 10 seconds to run on an M3 Max.

Apr 11 '24 12:04 bghira

I got this working as well! Inference time seems to increase more than linearly with prompt size

3 seconds of audio: 10 seconds of generation
8s of audio: ~90 seconds of generation
10 of audio: ~3min of generation

I think the reason is that itself takes a surprising amount of memory — loading the model takes the expected ~3GB of memory, but then inference takes 15 GB on top of that, which is probably what's slowing it down on my machine (16GB M2).

Apr 12 '24 02:04 maxtheman

I got this working as well! Inference time seems to increase more than linearly with prompt size

3 seconds of audio: 10 seconds of generation

8s of audio: ~90 seconds of generation

10 of audio: ~3min of generation

I think the reason is that itself takes a surprising amount of memory — loading the model takes the expected ~3GB of memory, but then inference takes 15 GB on top of that, which is probably what's slowing it down on my machine (16GB M2).

Swapping activated? I will try on Mac Mini M2 (24GB). Do we know the performance on CUDA on similar machine?

Apr 12 '24 02:04 QueryType

on the 128gb M3 Max i can get pretty far into the output window before the time increases to 3 minutes.

it'll take about a minute for 30 seconds of audio.

Apr 12 '24 02:04 bghira

of

I am getting, 2s of audio: 11 seconds and 6s of audio: 36 seconds

Apr 12 '24 13:04 QueryType

my data , on 64G M2 Max

seconds of audio	cpu(seconds of generation)	mps(seconds of generation)
1	7	10
3	13	17
7	30	44
9	41	194
18	71	308

Apr 15 '24 09:04 janewu77

parler-tts parler-tts copied to clipboard

[show and tell] apple mps support

parler-tts
parler-tts copied to clipboard