transformers [New model] Bark for realistic text-to-speech

[New model] Bark for realistic text-to-speech

Open xenova opened this issue 2 years ago • 0 comments

Model description

As stated in their README:

Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.

Some of their demos are quite amazing (albeit slightly creepy), being able to add "uhms" and "ahhs" in the synthesized audio. For example:

Hello, my name is Suno. And, uh — and I like pizza. [laughs] 
But I also have other interests such as playing tic tac toe.

https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

GitHub repo: https://github.com/suno-ai/bark Author: @gkucsko Demo: https://huggingface.co/spaces/suno/bark Model weights: Although not very well documented, here is the portion of the code which links to the model weights. @Vaibhavs10 also looks to have uploaded them to the HF Hub here 🔥

Apr 27 '23 22:04 xenova

transformers transformers copied to clipboard

[New model] Bark for realistic text-to-speech

Model description

Open source status

Provide useful links for the implementation

transformers
transformers copied to clipboard