webrtc icon indicating copy to clipboard operation
webrtc copied to clipboard

Optional Silence Gap Filling for Media Recording

Open Carsinalys opened this issue 4 months ago • 0 comments

Feature Request: Optional Silence Gap Filling for Media Recording

Summary

Add an optional configuration flag to enable automatic silence gap filling in RTP streams to maintain consistent media duration during recording operations.

Motivation

When recording WebRTC streams for meeting capture or similar applications, gaps in RTP packet transmission (due to silence detection, network issues, or DTX) result in recorded media files with inconsistent duration. This creates synchronization issues and makes post-processing more complex.

Currently, Pion WebRTC correctly follows the RTP specification by only forwarding received packets. However, for recording use cases, maintaining temporal consistency is often more important than packet-level accuracy.

Proposed Solution

Add an optional configuration parameter to TrackRemote or MediaEngine that enables automatic silence gap filling:

type SilenceFillingConfig struct {
    Enabled              bool
    MaxGapDuration       time.Duration // Maximum gap to fill (prevents runaway)
    PacketInterval       time.Duration // Expected packet interval (codec-dependent)
    SilencePayloadGen    func(codec string) []byte // Codec-specific silence frames
}

// Usage example
config := webrtc.Configuration{
    // ... existing config
}

mediaEngine := &webrtc.MediaEngine{}
mediaEngine.SetSilenceFilling(&SilenceFillingConfig{
    Enabled:        true,
    MaxGapDuration: 2 * time.Second,
    PacketInterval: 20 * time.Millisecond, // Opus default
})

api := webrtc.NewAPI(webrtc.WithMediaEngine(mediaEngine))

Implementation Details

Gap Detection

  • Monitor time intervals between consecutive RTP packets
  • Trigger gap filling when interval exceeds PacketInterval * threshold (e.g., 2x)

Silence Frame Generation

  • Generate codec-appropriate silence frames:
    • Opus: DTX frames (0xF8 0xFF)
    • G.711: Silence patterns
    • Other codecs: Configurable via SilencePayloadGen

RTP Header Consistency

  • Maintain proper sequence number progression
  • Calculate correct timestamps based on codec sample rate
  • Preserve SSRC/CSRC values

Safety Mechanisms

  • Limit maximum gap duration to prevent memory exhaustion
  • Configurable threshold for gap detection sensitivity
  • Optional callback for gap detection events

Use Cases

  1. Meeting Recording: Maintain consistent audio/video duration across all participants
  2. Media Archival: Ensure recorded files have accurate temporal representation
  3. Live Streaming: Prevent audio dropouts in real-time applications
  4. Compliance Recording: Meet regulatory requirements for complete session capture

Backward Compatibility

  • Feature is disabled by default to maintain current behavior
  • No impact on existing applications unless explicitly enabled
  • Configuration is optional and uses sensible defaults when enabled

Alternative Approaches Considered

  1. Application-level implementation: While possible, requires duplicating gap detection logic across applications
  2. Post-processing: Adds complexity and requires temporal analysis of recorded files
  3. MediaWriter interface: Could be implemented at the writer level, but loses RTP-level timing information

Related Work

  • Browser WebRTC: Often includes automatic comfort noise insertion
  • GStreamer: Provides silence detection and insertion elements
  • FFmpeg: Has silence detection and padding filters

Implementation Scope

Phase 1: Core functionality

  • Basic gap detection and Opus silence filling
  • Configuration interface
  • Unit tests

Phase 2: Extended support

  • Additional codec support (G.711, G.722, etc.)
  • Performance optimizations
  • Integration tests

Phase 3: Advanced features

  • Adaptive gap detection based on network conditions
  • Silence frame quality levels
  • Metrics and monitoring hooks

Community Impact

This feature would benefit:

  • Recording Applications: Simplified and more reliable media capture
  • Educational Platforms: Consistent lesson recordings
  • Enterprise Communications: Meeting archival and compliance
  • Live Streaming Services: Improved audio quality

Questions for Maintainers

  1. Would this feature align with Pion's design philosophy?
  2. Should this be implemented at the TrackRemote level or MediaEngine level?
  3. Are there any concerns about performance impact when enabled?
  4. Would you prefer this as a separate package or integrated into core?

Environment:

  • Pion WebRTC version: v4.1.1
  • Go version: 1.24.3

Carsinalys avatar Jun 05 '25 12:06 Carsinalys