webrtc
webrtc copied to clipboard
Optional Silence Gap Filling for Media Recording
Feature Request: Optional Silence Gap Filling for Media Recording
Summary
Add an optional configuration flag to enable automatic silence gap filling in RTP streams to maintain consistent media duration during recording operations.
Motivation
When recording WebRTC streams for meeting capture or similar applications, gaps in RTP packet transmission (due to silence detection, network issues, or DTX) result in recorded media files with inconsistent duration. This creates synchronization issues and makes post-processing more complex.
Currently, Pion WebRTC correctly follows the RTP specification by only forwarding received packets. However, for recording use cases, maintaining temporal consistency is often more important than packet-level accuracy.
Proposed Solution
Add an optional configuration parameter to TrackRemote or MediaEngine that enables automatic silence gap filling:
type SilenceFillingConfig struct {
Enabled bool
MaxGapDuration time.Duration // Maximum gap to fill (prevents runaway)
PacketInterval time.Duration // Expected packet interval (codec-dependent)
SilencePayloadGen func(codec string) []byte // Codec-specific silence frames
}
// Usage example
config := webrtc.Configuration{
// ... existing config
}
mediaEngine := &webrtc.MediaEngine{}
mediaEngine.SetSilenceFilling(&SilenceFillingConfig{
Enabled: true,
MaxGapDuration: 2 * time.Second,
PacketInterval: 20 * time.Millisecond, // Opus default
})
api := webrtc.NewAPI(webrtc.WithMediaEngine(mediaEngine))
Implementation Details
Gap Detection
- Monitor time intervals between consecutive RTP packets
- Trigger gap filling when interval exceeds
PacketInterval * threshold(e.g., 2x)
Silence Frame Generation
- Generate codec-appropriate silence frames:
- Opus: DTX frames (
0xF8 0xFF) - G.711: Silence patterns
- Other codecs: Configurable via
SilencePayloadGen
- Opus: DTX frames (
RTP Header Consistency
- Maintain proper sequence number progression
- Calculate correct timestamps based on codec sample rate
- Preserve SSRC/CSRC values
Safety Mechanisms
- Limit maximum gap duration to prevent memory exhaustion
- Configurable threshold for gap detection sensitivity
- Optional callback for gap detection events
Use Cases
- Meeting Recording: Maintain consistent audio/video duration across all participants
- Media Archival: Ensure recorded files have accurate temporal representation
- Live Streaming: Prevent audio dropouts in real-time applications
- Compliance Recording: Meet regulatory requirements for complete session capture
Backward Compatibility
- Feature is disabled by default to maintain current behavior
- No impact on existing applications unless explicitly enabled
- Configuration is optional and uses sensible defaults when enabled
Alternative Approaches Considered
- Application-level implementation: While possible, requires duplicating gap detection logic across applications
- Post-processing: Adds complexity and requires temporal analysis of recorded files
- MediaWriter interface: Could be implemented at the writer level, but loses RTP-level timing information
Related Work
- Browser WebRTC: Often includes automatic comfort noise insertion
- GStreamer: Provides silence detection and insertion elements
- FFmpeg: Has silence detection and padding filters
Implementation Scope
Phase 1: Core functionality
- Basic gap detection and Opus silence filling
- Configuration interface
- Unit tests
Phase 2: Extended support
- Additional codec support (G.711, G.722, etc.)
- Performance optimizations
- Integration tests
Phase 3: Advanced features
- Adaptive gap detection based on network conditions
- Silence frame quality levels
- Metrics and monitoring hooks
Community Impact
This feature would benefit:
- Recording Applications: Simplified and more reliable media capture
- Educational Platforms: Consistent lesson recordings
- Enterprise Communications: Meeting archival and compliance
- Live Streaming Services: Improved audio quality
Questions for Maintainers
- Would this feature align with Pion's design philosophy?
- Should this be implemented at the
TrackRemotelevel orMediaEnginelevel? - Are there any concerns about performance impact when enabled?
- Would you prefer this as a separate package or integrated into core?
Environment:
- Pion WebRTC version: v4.1.1
- Go version: 1.24.3