dcadec How to downmix 5.1 into stereo properly? I got overflow even with the -3DB Matrix by Summing the individual output channels.

Hey! 1st if all, Thanks for your work on this. I tried to got a stereo downmix of dts files. However, I found the option -2 will not work if the dts file doesn't contain the downmix coffs. So I tried to sum the individual audio tracks with the following formula: Lo = FL + 0.707_(C+SL); Ro = FR + 0.707_(C+SR); This will cause the final stereo file distorted. I compared with ffmpeg. It seems that its downmix will cause the volume became very low (about -9 dB).

So how should I do the downmix properly on the dts files that are not containing coff infos?

Looking forward to your reply. Thanks!

Jun 18 '15 09:06 eviluess

Are you sure you're not supposed to subtract 0.707 from the center and rear channels, before combining them into the output stereo channel?

Jul 02 '15 20:07 MarcusJohnson91

Do you mean the correct formula should be: Lo = FL - 0.707(C+SL); Ro = FR - 0.707(C+SR);

Jul 03 '15 07:07 eviluess

I mean Lo = FL + (C - 0.707), + (SL - 0.707)

Jul 03 '15 08:07 MarcusJohnson91

Confused, I think this might cause the DC offset become -1.414

Jul 03 '15 08:07 eviluess

Subtraction is clearly the wrong way to go. You are supposed to multiply C by 0.707 in any case, since that compensates for cloning the (single) C channel into two channels. Most people also multiply the surrounds by that to avoid the surrounds interfering with the front sounds too much (ie. your original formula)

However, this formula can cause overflows if the original audio signal already is at full volume. The only way to reliably combat this is to reduce the overall volume to avoid clipping - this is what ffmpeg does. It will result in somewhat quieter track, but its the only way to absolutely be sure that it will never overflow/clip.

Jul 03 '15 08:07 Nevcairiel

The only way to reliably combat this is to reduce the overall volume to avoid clipping - this is what ffmpeg does.

Actually, libswresample and libavresample have different defaults. libswresample doesn't do it AFAIK, which will sometimes result in clipping. I guess it boils down to the user's preference: get audio that's "too silent", or audio that might clip.

Jul 03 '15 09:07 ghost

So there's no way to make a established downmix result with non-suppressed gain and without any clip?

Jul 03 '15 09:07 eviluess

libswresample doesn't do it AFAIK, which will sometimes result in clipping.

swresample is a bit dumb. It does it when you use integer internal/output, but doesn't with float output, so yeah.

Jul 03 '15 09:07 Nevcairiel

So there's no way to make a established downmix result with non-suppressed gain and without any clip?

Not without analyzing the audio first to get its real peak information. A general 1-pass operation has to assume that all channels can contain full range audio at the same time, and if you combine 1.0 + 0.707 + 0.707 it will overflow, so you have to reduce volume, ie. effectively dividing by 2.414 (which is about 7.5dB in reduction, iirc)

Jul 03 '15 09:07 Nevcairiel

That's why I started the discussion.

I found some players request 5.1 channel configuration to the sound card via the waveOutOpen API by filling the corresponding Channel Flags (0x3F) to dwChannelMask, and will not lead to any overflow even I turn its volume and the system volume to the max together. The sound is louder than playing the downmix generated by ffmpeg (-7dB) directly in the player.

It seems that the sound card can do the downmix correctly? The 1-pass peak scanning couldn't have any chance to be processed.

Jul 03 '15 11:07 eviluess

Hi there, sorry to revive this old discussion but I'm looking for someone able to implement correctly the "Novel 5.1 Downmix Algorithm with Improved Dialogue Intelligibility" research: ndmix

Similar to the state-of-the-art downmix methods, only 5 channels are taken into consideration: L, R, C, Ls and Rs. We can represent the downmix operation in the form of the following equation:

lt [n] = l[n] + 0.707 * c[n] + (dlev - 1) * e[n] + 0.5 * ls [n] rt [n] = r[n] + 0.707 * c[n] + (dlev - 1) * e[n] + 0.5 * rs [n]

where e[n] is the extracted voice signal, dlev represents the dialogue level and all considered signals are represented in the digital domain, in which n denotes the sample index.

Someone @ Hydrogenaudio forums implemented it in this way:

ffmpeg -i 6chan-input.wav -af "pan=stereo|FL < 1.0FL + 0.707FC + 0.707BL|FR < 1.0FR + 0.707FC + 0.707BR" -ac copy stereo.wav

...do you think is correct (and proper) ?

Nov 06 '20 10:11 MarcoRavich

@forart this project is very dead.

The decoder was moved to ffmpeg, talk to them.

Nov 06 '20 10:11 MarcusJohnson91

DCADec hasn’t been updated in like 4-5 years, it’s been merged into FFmpeg.

In FFmpeg you’re looking to remap the channels, search that.

On Feb 1, 2022, at 8:44 PM, damian101 @.***> wrote:

I just use the same command I use for 6.1 and 7.1 too: -af 'lowpass=c=LFE:f=120,pan=stereo|FL=.3FL+.21FC+.3FLC+.3SL+.3BL+.21BC+.21LFE|FR=.3FR+.21FC+.3FRC+.3SR+.3BR+.21BC+.21LFE' Maybe not ideal loudness-wise for 5.1, but I usually normalize to -23 LUFS anyway.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

Feb 02 '22 02:02 MarcusJohnson91

dcadec dcadec copied to clipboard

How to downmix 5.1 into stereo properly? I got overflow even with the -3DB Matrix by Summing the individual output channels.

dcadec
dcadec copied to clipboard