encore icon indicating copy to clipboard operation
encore copied to clipboard

Silent gaps occur when chunked encoding

Open co-re-co opened this issue 3 months ago • 4 comments

When using chunked encoding, the target video file generates slight silent gaps at all block assembly points, resulting in audio jitter during playback. The encoding configuration uses program.yml files (modified to 1-pass transcodes), with segment lengths set to twice the GOP value (7.68 seconds). Observations reveal that while the encoded segment files appear flawless, silent gaps emerge after stitching. How do I avoid it

co-re-co avatar Oct 10 '25 07:10 co-re-co

Although I haven't tested this myself, as far as I understand, segmented encoding with audio will work fine if:

  • Audio track is muxed together with video, ie not separate audio track
  • Segment length is a multiple of both gop size and the audio frame duration (I think your 7.68s segment length should be ok)
  • codec is fdk_aac

This means with the program profile (I am assuming you are using this one only the audio tracks in the output video files would be ok, the separate audio tracks would have glitches because of aac priming samples not being removed when stitching.

You might also want to have a look at this release from a fork which has support for avoiding this problem by doing chunked encoding for video while audio is encoded without chunking: https://github.com/Eyevinn/encore/releases/tag/v0.2.9-1 .

grusell avatar Oct 10 '25 11:10 grusell

Hello Grusell, the three premises you mentioned should meet your requirements for "Audio track is muxed together with video, i.e. not separate audio track" and "codec is fdk_aac" because of the use of the project's default test configuration. The submitted tasks are as follows: { "externalId": "any-string", "profile": "program", "profileParams": {}, "outputFolder": "/data/output", "baseName": "test_1", "priority": 0, "segmentLength": 7.68,
"inputs": [ { "type": "AudioVideo", "uri": "/mnt/video/input/test.mp4", "copyTs": true } ] } Is there something I'm doing wrong?

co-re-co avatar Oct 10 '25 14:10 co-re-co

Grusell beat me to it, and I can only confirm his summary.

When you do AAC transcoding the encoder will always add priming samples to each audio segment. Normally, when you have both video and audio muxed into a container (like mp4), this does not cause problems because the start time / timescale of the video will take priority. However, if you have an audio only encode (like AudioEncode in Encore) and use ffmpeg to concatenate segments of AAC audio (like we do with chunked encoding in Encore), those priming samples wont be handled correctly and the audio will have periodic gaps of digital silence and obviously drift out of sync.

The way we use this feature in production for OTT VoD, which happens to solve the particular problem you are facing, is that we simply do not transcode a separate audio track. Instead we see to it that the each rung in the ladder has audio and video, and then we pick the audio from the highest quality rung when we package the transcoded files into HLS and DASH segments.

So my suggestion is: Make sure that the transcoding profile that you are using does not have a separate audioEncode with only AAC in it.

So the bitrate ladder in the profile would be used like this: Quality level 1: 1080p Video -> Gets packaged AAC-LC -> Gets packaged Quality level 2: 720p Video -> Gets packaged AAC-LC Quality level 1: 540p Video -> Gets packaged AAC-LC Quality level 1: 360p Video -> Gets packaged AAC-LC Quality level 1: 234p Video -> Gets packaged AAC-LC

Lufferman avatar Oct 10 '25 21:10 Lufferman

Thanks to the helpful Lufferman. Your explanation helped me understand the principle behind silent playback, but my configuration file was modified on program.yml this basis – it retains one quality level for segmental transcoding without separating video and audio processing. At least that's how I interpreted it. Below is my configuration file:

name: vod_264_1080 description: vod_264_1080 profile scaling: bicubic joinSegmentParams: movflags: +faststart

encodes:

  • type: X264Encode suffix: _x264_3100 twoPass: false height: 1080 params: b:v: 3100k maxrate: 4700k bufsize: 6200k r: 25 fps_mode: cfr pix_fmt: yuv420p force_key_frames: expr:not(mod(n,96)) profile:v: high #level: 4.1 x264-params: deblock: 0,0 aq-mode: 1 aq-strength: 1.0 b-adapt: 2 bframes: 6 b-bias: 0 b-pyramid: 2 chroma-qp-offset: -2 direct: auto rc-lookahead: 60 keyint: 96 keyint_min: 96 me: hex merange: 16 cabac: 1 partitions: all ref: 4 scenecut: 40 subme: 9 trellis: 2 weightp: 2 audioEncode: type: AudioEncode codec: aac bitrate: 128k suffix: STEREO

co-re-co avatar Oct 11 '25 07:10 co-re-co