WhisperSpeech icon indicating copy to clipboard operation
WhisperSpeech copied to clipboard

Map architecture in config.json and tokenizer.json files on HuggingFace

Open bartekupartek opened this issue 10 months ago • 1 comments

I've experimented with the Bark and your model and I've found your model simpler to follow and lighter than the Bark model, I'd like to port it in the Elixir Bumblebee project. It seems the pipeline.py file, which essentially includes a speaker, text-to-semantic, semantic-to-audio and a vocoder is all I need to adapt to enable TTS in my favorite lang. I've tried to load WhisperSpeech from HuggingFace in Elixir Bumblebee but stuck on begging because of missing required config.json and tokenizer.json and perhaps safetensors files, are you planning to support this or could anyone provide or point the required fields and values? This would help me to load all models natively, another way around would be ONNX runtime but this would create extra overhead in my case.

bartekupartek avatar Mar 28 '24 17:03 bartekupartek

Hey, I am not sure how the hugging face models are used in Bumblebee. I followed a similar naming convention as Huggingface but the model is implemented from scratch in PyTorch.

ONNXRuntime may work but I think their LLM support (and the architecture is pretty much like an LLM) was just released in most recent version so you may run into some issues.

jpc avatar Apr 10 '24 09:04 jpc