transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Support `text-to-speech` in `pipeline` function and in Optimum

Open josephrocca opened this issue 2 years ago • 9 comments

Feature request

SpeechT5 was recently added to Transformers:

  • Blog post: https://huggingface.co/blog/speecht5
  • Spaces demo: https://huggingface.co/spaces/Matthijs/speecht5-tts-demo
  • Models: https://huggingface.co/mechanicalsea/speecht5-tts

It would be great if text-to-speech could be supported across the Transformers stack.

Motivation

@xenova bumped into this as an issue when trying to get SpeechT5 working in the browser (Transformers.js).

Your contribution

Probably unable to help with this at the moment.

josephrocca avatar Mar 31 '23 08:03 josephrocca

cc @sanchit-gandhi

sgugger avatar Mar 31 '23 13:03 sgugger

Indeed, a TTS pipeline would be super helpful to run SpeechT5. We're currently planning on waiting till we have 1-2 more TTS models in the library before pushing ahead with a TTS pipeline, in order to verify that the pipeline is generalisable and gives a benefit over loading a single model + processor.

cc @hollance

sanchit-gandhi avatar Apr 04 '23 17:04 sanchit-gandhi

Any viable contenders for the other 1-2 models? https://paperswithcode.com/task/text-to-speech-synthesis

josephrocca avatar Apr 04 '23 17:04 josephrocca

Hey, I'd be more than happy to take up this task if we can decide on the other 1-2 models

mayankagarwals avatar Apr 06 '23 16:04 mayankagarwals

Hey, I'd be more than happy to take up this task if we can decide on the other 1-2 models

We can probably just select the most popular models from the hub: https://huggingface.co/models?pipeline_tag=text-to-speech&sort=downloads

xenova avatar Apr 06 '23 16:04 xenova

There is an open PR for FastSpeech2. I think this is a good new model to add. If anyone is interested in taking that PR to completion, that would be awesome!

hollance avatar Apr 07 '23 08:04 hollance

Hey, I'd be more than happy to take up this task if we can decide on the other 1-2 models

Let me know if you need any help! I’m excited for this to be added šŸ”„

xenova avatar Apr 18 '23 02:04 xenova

Here's another model which could fall into the text-to-speech category: https://github.com/huggingface/transformers/issues/23036

xenova avatar Apr 27 '23 22:04 xenova

Just added one more https://github.com/huggingface/transformers/issues/23050

jozefchutka avatar Apr 28 '23 13:04 jozefchutka

Please add support for the mms-tts model as mentioned in above issue to the TTS pipeline.

bil-ash avatar Jul 22 '23 14:07 bil-ash

Good news! This is currently being worked on: https://github.com/huggingface/transformers/pull/24952 šŸš€šŸ”„

xenova avatar Jul 22 '23 14:07 xenova