mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Is it possible to know or highlight which word is being spoken?

Open sickerin opened this issue 6 months ago • 1 comments

For instance after generating a paragraph. I would like to have information of when each word starts in time. Let's say for this sentence "The boy was there when the sun rose. A rod is used to catch pink salmon." I would like to also get the data, when each word starts.

The 0.0s boy 0.5s was 1.2s there 2.0s etc

I'm trying to use the kokoro model. Are there other models that might be lightweight and available on mlx that be able to do this?

sickerin avatar Jun 13 '25 15:06 sickerin

I found that there's this in kokoro, not sure if it's implemented https://github.com/hexgrad/kokoro/issues/32

sickerin avatar Aug 10 '25 03:08 sickerin