Speaker info for WebVTT subtitles
We're using WebvttParser() to parse WebVTT transcripts, which internally uses WebvttCueParser() to parse cues.
While WebvttCueParser.parseCueText(...) provides the payload with style formatting, it removes the speaker identifiers or the TAG_VOICE.
Can we get an in-built support for speaker info within the cues of WebVTT transcripts?
I guess you're referring to this? https://www.w3.org/TR/webvtt1/#webvtt-cue-voice-span
We do have some code already that parses <v[voice="foo"]> tags in order to resolve the associated styles: https://github.com/search?q=repo%3Aandroidx%2Fmedia+path%3Avtt+voice&type=code
But the voice information isn't directly exposed in the resulting Cue object.
I suspect this would need to be exposed using a 'custom span' in Cue.text, like we do for Japanese rubies: https://github.com/androidx/media/blob/release/libraries/common/src/main/java/androidx/media3/common/text/RubySpan.java
I'll mark this as an enhancement. I'm afraid we are unlikely to work on this ourselves soon, but we would consider a high quality PR implementing this.
Closing this because https://github.com/androidx/media/pull/1652 has been merged - thanks!