media icon indicating copy to clipboard operation
media copied to clipboard

Speaker info for WebVTT subtitles

Open ashiagr opened this issue 1 year ago • 1 comments

We're using WebvttParser() to parse WebVTT transcripts, which internally uses WebvttCueParser() to parse cues. While WebvttCueParser.parseCueText(...) provides the payload with style formatting, it removes the speaker identifiers or the TAG_VOICE.

Can we get an in-built support for speaker info within the cues of WebVTT transcripts?

ashiagr avatar Aug 20 '24 03:08 ashiagr

I guess you're referring to this? https://www.w3.org/TR/webvtt1/#webvtt-cue-voice-span

We do have some code already that parses <v[voice="foo"]> tags in order to resolve the associated styles: https://github.com/search?q=repo%3Aandroidx%2Fmedia+path%3Avtt+voice&type=code

But the voice information isn't directly exposed in the resulting Cue object.

I suspect this would need to be exposed using a 'custom span' in Cue.text, like we do for Japanese rubies: https://github.com/androidx/media/blob/release/libraries/common/src/main/java/androidx/media3/common/text/RubySpan.java

I'll mark this as an enhancement. I'm afraid we are unlikely to work on this ourselves soon, but we would consider a high quality PR implementing this.

icbaker avatar Aug 20 '24 12:08 icbaker

Closing this because https://github.com/androidx/media/pull/1652 has been merged - thanks!

icbaker avatar Sep 02 '24 16:09 icbaker