parler-tts [show and tell] apple mps support

with newer pytorch (2.4 nightly) we get bfloat16 support in MPS.

i tested this:

from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
import torch

device = "mps:0"

model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device=device, dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1")

prompt = "welcome to huggingface"
description = "An old man."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device=device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device=device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.to(torch.float32).cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

Apr 10 '24 19:04 bghira

That's awesome, thanks for sharing @bghira! How fast was inference on your local machine?

Apr 11 '24 11:04 sanchit-gandhi

it gets slower as the sample size increases but this test script takes about 10 seconds to run on an M3 Max.

Apr 11 '24 12:04 bghira

I got this working as well! Inference time seems to increase more than linearly with prompt size

3 seconds of audio: 10 seconds of generation
8s of audio: ~90 seconds of generation
10 of audio: ~3min of generation

I think the reason is that itself takes a surprising amount of memory — loading the model takes the expected ~3GB of memory, but then inference takes 15 GB on top of that, which is probably what's slowing it down on my machine (16GB M2).

Apr 12 '24 02:04 maxtheman

I got this working as well! Inference time seems to increase more than linearly with prompt size

3 seconds of audio: 10 seconds of generation

8s of audio: ~90 seconds of generation

10 of audio: ~3min of generation

I think the reason is that itself takes a surprising amount of memory — loading the model takes the expected ~3GB of memory, but then inference takes 15 GB on top of that, which is probably what's slowing it down on my machine (16GB M2).

Swapping activated? I will try on Mac Mini M2 (24GB). Do we know the performance on CUDA on similar machine?

Apr 12 '24 02:04 QueryType

on the 128gb M3 Max i can get pretty far into the output window before the time increases to 3 minutes.

it'll take about a minute for 30 seconds of audio.

Apr 12 '24 02:04 bghira

of

I am getting, 2s of audio: 11 seconds and 6s of audio: 36 seconds

Apr 12 '24 13:04 QueryType

my data , on 64G M2 Max

seconds of audio	cpu(seconds of generation)	mps(seconds of generation)
1	7	10
3	13	17
7	30	44
9	41	194
18	71	308

Apr 15 '24 09:04 janewu77

I'm getting this error

NotImplementedError: Output channels > 65536 not supported at the MPS device. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Did something change or is it still working for you?

In [2]: torch.version Out[2]: '2.5.0.dev20240726'

Jul 26 '24 15:07 andimarafioti

stick with pytorch 2.4 unless you want things blowing up constantly is my suggestion

Jul 26 '24 15:07 bghira

parler-tts parler-tts copied to clipboard

[show and tell] apple mps support

parler-tts
parler-tts copied to clipboard