player icon indicating copy to clipboard operation
player copied to clipboard

fix(text-track): normalize whitespace in content before parsing

Open pzanella opened this issue 2 months ago • 0 comments

Related:

Fixes SRT subtitle parsing issues where comma-separated milliseconds (e.g., 00:00:01,000) fail to parse when content contains indented template literals. Closes #1671

Description:

This PR adds whitespace normalization to the #parseContent method in TextTrack to handle indented template literal content before passing it to the media-captions parser.

Changes:

  • Core Fix: Added content.split('\n').map(line => line.trim()).join('\n').trim() to normalize whitespace while preserving line structure
  • Test Coverage: Added comprehensive unit tests (16 test cases) covering whitespace normalization scenarios
  • Edge Cases: Handles mixed indentation, Windows line endings, tabs, and extremely large indentation
  • Backward Compatibility: Preserves empty lines between subtitle blocks and doesn't affect JSON content parsing

Problem Solved: The media-captions library expects clean content without leading/trailing whitespace. When using template literals with indentation for inline SRT content, the parser would fail with errors like "cue start timestamp \'00:00:00,000\' is invalid" even though the SRT format itself was correct.

Ready?

Yes - All tests passing, edge cases covered, and backward compatibility maintained.

Anything Else?

Before (Failed):

const track = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle`
});
// ❌ Error: cue start timestamp `00:00:01,000` is invalid

After (Works)

const track = new TextTrack({
  kind: 'subtitles', 
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle`
});
// ✅ Parses successfully - whitespace normalized automatically

Test Results:16/16 tests passing

Test Category Tests Status Coverage
Whitespace Normalization 11 ✅ Pass Template literals, mixed indentation, leading/trailing whitespace
Edge Cases 3 ✅ Pass Tabs, Windows line endings (\r\n), extreme indentation
Regression Tests 2 ✅ Pass SRT comma timestamps, VTT period timestamps

Detailed Coverage:

  • ✅ Template literal indentation scenarios
  • ✅ Mixed whitespace handling
  • ✅ Empty line preservation between subtitle blocks
  • ✅ JSON content bypass (no normalization applied)
  • ✅ Windows line endings (\r\n) compatibility
  • ✅ Tab characters and extreme indentation handling
  • ✅ SRT comma timestamps (00:00:01,000) parsing
  • ✅ VTT period timestamps (00:00:01.000) parsing
  • ✅ Complex content with HTML formatting and special characters
  • ✅ Single line content normalization
  • ✅ Whitespace-only content handling

Manual Test Scenarios:

Template Literal Testing

// Test 1: Basic indented SRT content
const track1 = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle
    
    2
    00:00:06,000 --> 00:00:10,000
    Second subtitle`
});
// Expected: ✅ Parses successfully

Cross-platform Line Endings

// Test 2: Windows line endings (\r\n)
const windowsContent = "1\r\n00:00:01,000 --> 00:00:05,000\r\nSubtitle text";
const track2 = new TextTrack({
  kind: 'subtitles',
  type: 'srt', 
  content: `    ${windowsContent}`
});
// Expected: ✅ Handles \r\n correctly

JSON Content Bypass

// Test 3: JSON content should skip normalization
const track3 = new TextTrack({
  kind: 'subtitles',
  type: 'json',
  content: {
    cues: [{ startTime: 1, endTime: 5, text: "Test" }]
  }
});
// Expected: ✅ No normalization applied to JSON

VTT Format Testing

// Test 4: VTT with period timestamps
const track4 = new TextTrack({
  kind: 'subtitles',
  type: 'vtt',
  content: `    WEBVTT
    
    1
    00:00:01.000 --> 00:00:05.000
    VTT subtitle`
});
// Expected: ✅ VTT format works with periods

Edge Case Validation

// Test 5: Mixed whitespace (tabs + spaces)
const mixedContent = "\t  1\n\t00:00:01,000 --> 00:00:05,000\n\t  Mixed whitespace";
const track5 = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: mixedContent
});
// Expected: ✅ Handles mixed tab/space indentation

pzanella avatar Oct 29 '25 10:10 pzanella