transformers
transformers copied to clipboard
[New model] Bark for realistic text-to-speech
Model description
As stated in their README:
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.
Some of their demos are quite amazing (albeit slightly creepy), being able to add "uhms" and "ahhs" in the synthesized audio. For example:
Hello, my name is Suno. And, uh — and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
GitHub repo: https://github.com/suno-ai/bark Author: @gkucsko Demo: https://huggingface.co/spaces/suno/bark Model weights: Although not very well documented, here is the portion of the code which links to the model weights. @Vaibhavs10 also looks to have uploaded them to the HF Hub here 🔥