MeloTTS Enable Intel XPU

Enable Intel XPU to accelerate TTS inference since PyTorch 2.6 already support Intel XPU: https://pytorch.org/docs/stable/notes/get_start_xpu.html.

Software Perquisite: https://pytorch.org/docs/stable/notes/get_start_xpu.html#software-prerequisite

Below is my environment:

CPU: Intel Core Ultra 7 165H
RAM: 64GB
OS: Ubuntu 22.04.5 LTS (kernel: 6.8.0-52-generic)
Python version: 3.11.11

Below is my quick test to compare TTS time taken for CPU and XPU:

Language	CPU (s)	XPU (s)
EN	14.0701	7.2301
ZH	19.3216	13.8332
ES	11.7642	4.4456
FR	17.0136	6.8164
JP	28.5677	7.3959
KR	29.7932	10.8559

Feb 20 '25 06:02 Desmond0804

Copy PR to local code, it's works. Computing base on XPU(iGPU of Intel Ultra 7 155H). By the way, general GPU loading is around 30%. If there is has optimization ways, to short loading time and improve RTF?

Feb 26 '25 10:02 HarryBXie

Sorry for late reply. A bit busy recently, I will look at the optimization part later. Can I know how you get the loading time? Previously I done my quick test using code like below:

from melo.api import TTS
import time

speed = 1.0
device = 'xpu'

text = '''Did you ever hear a folk tale about a giant turtle?'''
language = 'EN'
model = TTS(language=language, device=device)
speaker_ids = model.hps.data.spk2id
output_path = f'{language}.wav'

start = time.time()
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)
end = time.time()
print(f"Time taken: {end-start}")

Mar 05 '25 06:03 Desmond0804