Dedup segmented webvtt captions with the same id (fixes #1688)

Open gjdipietro opened this issue 4 months ago • 0 comments

Issue: #1688

Description:

addCue now checks if the cue's identifier has already been processed. Duplicate cues (by identifier) are ignored.

Why duplicates occur:

Segmented WebVTT files can split a cue across segment boundaries. In such cases, the same cue may appear in multiple segments. Deduplicating by id prevents the same cue from appearing twice on screen, without affecting cues that legitimately share content but have different IDs.

Example — Segmented WebVTT:

Segment 1:

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:456512,LOCAL:00:00:00.000

1
00:00:11.000 --> 00:00:13.000
<v Roger Bingham>We are in New York City

Segment 2

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:456512,LOCAL:00:00:00.000

1
00:00:11.000 --> 00:00:13.000
<v Roger Bingham>We are in New York City

2
00:00:13.001 --> 00:00:16.000
<v Roger Bingham>We're actually at the Lucern Hotel, just down the street

3
00:00:16.000 --> 00:00:18.000
<v Roger Bingham>from the American Museum of Natural History

4
00:00:18.000 --> 00:00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson

This would follow the approach used by hls.js to deduplicate cues across segments See: https://github.com/video-dev/hls.js/issues/4563

Ready?

Ready to be reviewed

Anything Else?

Before

After Screenshot 2025-09-04 at 12 00 25 PM

Sep 05 '25 11:09 gjdipietro

Dedup segmented webvtt captions with the same id (fixes #1688)

Related:

Description:

Ready?

Anything Else?