fix(text-track): normalize whitespace in content before parsing
Related:
Fixes SRT subtitle parsing issues where comma-separated milliseconds (e.g., 00:00:01,000) fail to parse when content contains indented template literals.
Closes #1671
Description:
This PR adds whitespace normalization to the #parseContent method in TextTrack to handle indented template literal content before passing it to the media-captions parser.
Changes:
- Core Fix: Added
content.split('\n').map(line => line.trim()).join('\n').trim()to normalize whitespace while preserving line structure - Test Coverage: Added comprehensive unit tests (16 test cases) covering whitespace normalization scenarios
- Edge Cases: Handles mixed indentation, Windows line endings, tabs, and extremely large indentation
- Backward Compatibility: Preserves empty lines between subtitle blocks and doesn't affect JSON content parsing
Problem Solved:
The media-captions library expects clean content without leading/trailing whitespace. When using template literals with indentation for inline SRT content, the parser would fail with errors like "cue start timestamp \'00:00:00,000\' is invalid" even though the SRT format itself was correct.
Ready?
✅ Yes - All tests passing, edge cases covered, and backward compatibility maintained.
Anything Else?
Before (Failed):
const track = new TextTrack({
kind: 'subtitles',
type: 'srt',
content: ` 1
00:00:01,000 --> 00:00:05,000
First subtitle`
});
// ❌ Error: cue start timestamp `00:00:01,000` is invalid
After (Works)
const track = new TextTrack({
kind: 'subtitles',
type: 'srt',
content: ` 1
00:00:01,000 --> 00:00:05,000
First subtitle`
});
// ✅ Parses successfully - whitespace normalized automatically
Test Results: ✅ 16/16 tests passing
| Test Category | Tests | Status | Coverage |
|---|---|---|---|
| Whitespace Normalization | 11 | ✅ Pass | Template literals, mixed indentation, leading/trailing whitespace |
| Edge Cases | 3 | ✅ Pass | Tabs, Windows line endings (\r\n), extreme indentation |
| Regression Tests | 2 | ✅ Pass | SRT comma timestamps, VTT period timestamps |
Detailed Coverage:
- ✅ Template literal indentation scenarios
- ✅ Mixed whitespace handling
- ✅ Empty line preservation between subtitle blocks
- ✅ JSON content bypass (no normalization applied)
- ✅ Windows line endings (
\r\n) compatibility - ✅ Tab characters and extreme indentation handling
- ✅ SRT comma timestamps (
00:00:01,000) parsing - ✅ VTT period timestamps (
00:00:01.000) parsing - ✅ Complex content with HTML formatting and special characters
- ✅ Single line content normalization
- ✅ Whitespace-only content handling
Manual Test Scenarios:
Template Literal Testing
// Test 1: Basic indented SRT content
const track1 = new TextTrack({
kind: 'subtitles',
type: 'srt',
content: ` 1
00:00:01,000 --> 00:00:05,000
First subtitle
2
00:00:06,000 --> 00:00:10,000
Second subtitle`
});
// Expected: ✅ Parses successfully
Cross-platform Line Endings
// Test 2: Windows line endings (\r\n)
const windowsContent = "1\r\n00:00:01,000 --> 00:00:05,000\r\nSubtitle text";
const track2 = new TextTrack({
kind: 'subtitles',
type: 'srt',
content: ` ${windowsContent}`
});
// Expected: ✅ Handles \r\n correctly
JSON Content Bypass
// Test 3: JSON content should skip normalization
const track3 = new TextTrack({
kind: 'subtitles',
type: 'json',
content: {
cues: [{ startTime: 1, endTime: 5, text: "Test" }]
}
});
// Expected: ✅ No normalization applied to JSON
VTT Format Testing
// Test 4: VTT with period timestamps
const track4 = new TextTrack({
kind: 'subtitles',
type: 'vtt',
content: ` WEBVTT
1
00:00:01.000 --> 00:00:05.000
VTT subtitle`
});
// Expected: ✅ VTT format works with periods
Edge Case Validation
// Test 5: Mixed whitespace (tabs + spaces)
const mixedContent = "\t 1\n\t00:00:01,000 --> 00:00:05,000\n\t Mixed whitespace";
const track5 = new TextTrack({
kind: 'subtitles',
type: 'srt',
content: mixedContent
});
// Expected: ✅ Handles mixed tab/space indentation