mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Kokoro with all supported languages and voices + Orpheus added to API and UI

Open ivanfioravanti opened this issue 11 months ago • 12 comments

/voices API added to get list of Kokoro voices and filter them by language for the frontend.

Closes #29 and #30

ivanfioravanti avatar Mar 23 '25 21:03 ivanfioravanti

This is great!

I was thinking about the same but for all models.

Because Orpheus has serval voices as well.

Blaizzy avatar Mar 23 '25 21:03 Blaizzy

It's a great idea! Adding Orpheus model and voices right now 🚀

ivanfioravanti avatar Mar 23 '25 21:03 ivanfioravanti

Done and ready for review @Blaizzy 🚀

ivanfioravanti avatar Mar 23 '25 22:03 ivanfioravanti

@Blaizzy I tested all Orpheus voices 1 by 1, some of them are not working. Tara, Zac e Zoe create long audio with empty parts or prolonged audio. Even with generate from command line. Give them a try.

ivanfioravanti avatar Mar 23 '25 22:03 ivanfioravanti

Hey Ivan

Yes, you are right! I noticed the same.

I would remove those voices for now. Add some comments and we can revisit them later.

Blaizzy avatar Mar 26 '25 23:03 Blaizzy

We can try to add back all voices after #68

ivanfioravanti avatar Mar 29 '25 14:03 ivanfioravanti

Closed by mistake, working on it.

ivanfioravanti avatar Mar 29 '25 15:03 ivanfioravanti

No worries, let me know when you ready :)

Blaizzy avatar Mar 29 '25 16:03 Blaizzy

Ok @Blaizzy ready to go. Orpheus was fixed at 15 seconds of audio. I changed logic to be able to split text in multiple ways. Everything seems good to me:

  • All voices and languages added for Orpheus
  • Longer audio generation in Orpheus

ivanfioravanti avatar Mar 29 '25 16:03 ivanfioravanti

@Blaizzy ready!

ivanfioravanti avatar Mar 29 '25 17:03 ivanfioravanti

@lucasnewman could you please check the sesame changes and see if anything stands out?

I noticed that the generate doesn't process list of prompts like Kokoro (pipeline) and Orpheus.

Initially I thought of enforcing all models to use a pipeline that would serve to handle list of inputs, but for Orpheus I just keep the idea inside generate because since it's an LLM, the pipeline code was just gonna be a few of code .

Blaizzy avatar Mar 29 '25 18:03 Blaizzy

@lucasnewman could you please check the sesame changes and see if anything stands out?

Looks fine to me apart from your comments.

I noticed that the generate doesn't process list of prompts like Kokoro (pipeline) and Orpheus.

Initially I thought of enforcing all models to use a pipeline that would serve to handle list of inputs, but for Orpheus I just keep the idea inside generate because since it's an LLM, the pipeline code was just gonna be a few of code .

Yeah, I personally prefer the simplest approach and lighter abstraction. I think it's reasonable to have every generate() implementation take either a string or list of strings though, since sentence splitting is so common / useful.

lucasnewman avatar Mar 29 '25 19:03 lucasnewman