Optional Silence Gap Filling for Media Recording

Open Carsinalys opened this issue 4 months ago • 0 comments

Feature Request: Optional Silence Gap Filling for Media Recording

Summary

Add an optional configuration flag to enable automatic silence gap filling in RTP streams to maintain consistent media duration during recording operations.

Motivation

When recording WebRTC streams for meeting capture or similar applications, gaps in RTP packet transmission (due to silence detection, network issues, or DTX) result in recorded media files with inconsistent duration. This creates synchronization issues and makes post-processing more complex.

Currently, Pion WebRTC correctly follows the RTP specification by only forwarding received packets. However, for recording use cases, maintaining temporal consistency is often more important than packet-level accuracy.

Proposed Solution

Add an optional configuration parameter to TrackRemote or MediaEngine that enables automatic silence gap filling:

type SilenceFillingConfig struct {
    Enabled              bool
    MaxGapDuration       time.Duration // Maximum gap to fill (prevents runaway)
    PacketInterval       time.Duration // Expected packet interval (codec-dependent)
    SilencePayloadGen    func(codec string) []byte // Codec-specific silence frames
}

// Usage example
config := webrtc.Configuration{
    // ... existing config
}

mediaEngine := &webrtc.MediaEngine{}
mediaEngine.SetSilenceFilling(&SilenceFillingConfig{
    Enabled:        true,
    MaxGapDuration: 2 * time.Second,
    PacketInterval: 20 * time.Millisecond, // Opus default
})

api := webrtc.NewAPI(webrtc.WithMediaEngine(mediaEngine))

Implementation Details

Gap Detection

Monitor time intervals between consecutive RTP packets
Trigger gap filling when interval exceeds PacketInterval * threshold (e.g., 2x)

Silence Frame Generation

Generate codec-appropriate silence frames:
- Opus: DTX frames (0xF8 0xFF)
- G.711: Silence patterns
- Other codecs: Configurable via SilencePayloadGen

RTP Header Consistency

Maintain proper sequence number progression
Calculate correct timestamps based on codec sample rate
Preserve SSRC/CSRC values

Safety Mechanisms

Limit maximum gap duration to prevent memory exhaustion
Configurable threshold for gap detection sensitivity
Optional callback for gap detection events

Use Cases

Meeting Recording: Maintain consistent audio/video duration across all participants
Media Archival: Ensure recorded files have accurate temporal representation
Live Streaming: Prevent audio dropouts in real-time applications
Compliance Recording: Meet regulatory requirements for complete session capture

Backward Compatibility

Feature is disabled by default to maintain current behavior
No impact on existing applications unless explicitly enabled
Configuration is optional and uses sensible defaults when enabled

Alternative Approaches Considered

Application-level implementation: While possible, requires duplicating gap detection logic across applications
Post-processing: Adds complexity and requires temporal analysis of recorded files
MediaWriter interface: Could be implemented at the writer level, but loses RTP-level timing information

Related Work

Browser WebRTC: Often includes automatic comfort noise insertion
GStreamer: Provides silence detection and insertion elements
FFmpeg: Has silence detection and padding filters

Implementation Scope

Phase 1: Core functionality

Basic gap detection and Opus silence filling
Configuration interface
Unit tests

Phase 2: Extended support

Additional codec support (G.711, G.722, etc.)
Performance optimizations
Integration tests

Phase 3: Advanced features

Adaptive gap detection based on network conditions
Silence frame quality levels
Metrics and monitoring hooks

Community Impact

This feature would benefit:

Recording Applications: Simplified and more reliable media capture
Educational Platforms: Consistent lesson recordings
Enterprise Communications: Meeting archival and compliance
Live Streaming Services: Improved audio quality

Questions for Maintainers

Would this feature align with Pion's design philosophy?
Should this be implemented at the TrackRemote level or MediaEngine level?
Are there any concerns about performance impact when enabled?
Would you prefer this as a separate package or integrated into core?

Environment:

Pion WebRTC version: v4.1.1
Go version: 1.24.3

Jun 05 '25 12:06 Carsinalys

webrtc webrtc copied to clipboard

Optional Silence Gap Filling for Media Recording

Feature Request: Optional Silence Gap Filling for Media Recording

Summary

Motivation

Proposed Solution

Implementation Details

Gap Detection

Silence Frame Generation

RTP Header Consistency

Safety Mechanisms

Use Cases

Backward Compatibility

Alternative Approaches Considered

Related Work

Implementation Scope

Community Impact

Questions for Maintainers

webrtc
webrtc copied to clipboard