media Speaker info for WebVTT subtitles

We're using WebvttParser() to parse WebVTT transcripts, which internally uses WebvttCueParser() to parse cues. While WebvttCueParser.parseCueText(...) provides the payload with style formatting, it removes the speaker identifiers or the TAG_VOICE.

Can we get an in-built support for speaker info within the cues of WebVTT transcripts?

Aug 20 '24 03:08 ashiagr

I guess you're referring to this? https://www.w3.org/TR/webvtt1/#webvtt-cue-voice-span

We do have some code already that parses <v[voice="foo"]> tags in order to resolve the associated styles: https://github.com/search?q=repo%3Aandroidx%2Fmedia+path%3Avtt+voice&type=code

But the voice information isn't directly exposed in the resulting Cue object.

I suspect this would need to be exposed using a 'custom span' in Cue.text, like we do for Japanese rubies: https://github.com/androidx/media/blob/release/libraries/common/src/main/java/androidx/media3/common/text/RubySpan.java

I'll mark this as an enhancement. I'm afraid we are unlikely to work on this ourselves soon, but we would consider a high quality PR implementing this.

Aug 20 '24 12:08 icbaker

Closing this because https://github.com/androidx/media/pull/1652 has been merged - thanks!

Sep 02 '24 16:09 icbaker