fix(text-track): normalize whitespace in content before parsing

Open pzanella opened this issue 2 months ago • 0 comments

Fixes SRT subtitle parsing issues where comma-separated milliseconds (e.g., 00:00:01,000) fail to parse when content contains indented template literals. Closes #1671

Description:

This PR adds whitespace normalization to the #parseContent method in TextTrack to handle indented template literal content before passing it to the media-captions parser.

Changes:

Core Fix: Added content.split('\n').map(line => line.trim()).join('\n').trim() to normalize whitespace while preserving line structure
Test Coverage: Added comprehensive unit tests (16 test cases) covering whitespace normalization scenarios
Edge Cases: Handles mixed indentation, Windows line endings, tabs, and extremely large indentation
Backward Compatibility: Preserves empty lines between subtitle blocks and doesn't affect JSON content parsing

Problem Solved: The media-captions library expects clean content without leading/trailing whitespace. When using template literals with indentation for inline SRT content, the parser would fail with errors like "cue start timestamp \'00:00:00,000\' is invalid" even though the SRT format itself was correct.

Ready?

✅ Yes - All tests passing, edge cases covered, and backward compatibility maintained.

Anything Else?

Before (Failed):

const track = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle`
});
// ❌ Error: cue start timestamp `00:00:01,000` is invalid

After (Works)

const track = new TextTrack({
  kind: 'subtitles', 
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle`
});
// ✅ Parses successfully - whitespace normalized automatically

Test Results: ✅ 16/16 tests passing

Test Category	Tests	Status	Coverage
Whitespace Normalization	11	✅ Pass	Template literals, mixed indentation, leading/trailing whitespace
Edge Cases	3	✅ Pass	Tabs, Windows line endings (`\r\n`), extreme indentation
Regression Tests	2	✅ Pass	SRT comma timestamps, VTT period timestamps

Detailed Coverage:

✅ Template literal indentation scenarios
✅ Mixed whitespace handling
✅ Empty line preservation between subtitle blocks
✅ JSON content bypass (no normalization applied)
✅ Windows line endings (\r\n) compatibility
✅ Tab characters and extreme indentation handling
✅ SRT comma timestamps (00:00:01,000) parsing
✅ VTT period timestamps (00:00:01.000) parsing
✅ Complex content with HTML formatting and special characters
✅ Single line content normalization
✅ Whitespace-only content handling

Manual Test Scenarios:

Template Literal Testing

// Test 1: Basic indented SRT content
const track1 = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: `    1
    00:00:01,000 --> 00:00:05,000
    First subtitle
    
    2
    00:00:06,000 --> 00:00:10,000
    Second subtitle`
});
// Expected: ✅ Parses successfully

Cross-platform Line Endings

// Test 2: Windows line endings (\r\n)
const windowsContent = "1\r\n00:00:01,000 --> 00:00:05,000\r\nSubtitle text";
const track2 = new TextTrack({
  kind: 'subtitles',
  type: 'srt', 
  content: `    ${windowsContent}`
});
// Expected: ✅ Handles \r\n correctly

JSON Content Bypass

// Test 3: JSON content should skip normalization
const track3 = new TextTrack({
  kind: 'subtitles',
  type: 'json',
  content: {
    cues: [{ startTime: 1, endTime: 5, text: "Test" }]
  }
});
// Expected: ✅ No normalization applied to JSON

VTT Format Testing

// Test 4: VTT with period timestamps
const track4 = new TextTrack({
  kind: 'subtitles',
  type: 'vtt',
  content: `    WEBVTT
    
    1
    00:00:01.000 --> 00:00:05.000
    VTT subtitle`
});
// Expected: ✅ VTT format works with periods

Edge Case Validation

// Test 5: Mixed whitespace (tabs + spaces)
const mixedContent = "\t  1\n\t00:00:01,000 --> 00:00:05,000\n\t  Mixed whitespace";
const track5 = new TextTrack({
  kind: 'subtitles',
  type: 'srt',
  content: mixedContent
});
// Expected: ✅ Handles mixed tab/space indentation

Oct 29 '25 10:10 pzanella

fix(text-track): normalize whitespace in content before parsing

Related:

Description:

Ready?

Anything Else?

Template Literal Testing

Cross-platform Line Endings

JSON Content Bypass

VTT Format Testing

Edge Case Validation