dcadec icon indicating copy to clipboard operation
dcadec copied to clipboard

How to downmix 5.1 into stereo properly? I got overflow even with the -3DB Matrix by Summing the individual output channels.

Open eviluess opened this issue 9 years ago • 15 comments

Hey! 1st if all, Thanks for your work on this. I tried to got a stereo downmix of dts files. However, I found the option -2 will not work if the dts file doesn't contain the downmix coffs. So I tried to sum the individual audio tracks with the following formula: Lo = FL + 0.707_(C+SL); Ro = FR + 0.707_(C+SR); This will cause the final stereo file distorted. I compared with ffmpeg. It seems that its downmix will cause the volume became very low (about -9 dB).

So how should I do the downmix properly on the dts files that are not containing coff infos?

Looking forward to your reply. Thanks!

eviluess avatar Jun 18 '15 09:06 eviluess

Are you sure you're not supposed to subtract 0.707 from the center and rear channels, before combining them into the output stereo channel?

MarcusJohnson91 avatar Jul 02 '15 20:07 MarcusJohnson91

Do you mean the correct formula should be: Lo = FL - 0.707(C+SL); Ro = FR - 0.707(C+SR);

eviluess avatar Jul 03 '15 07:07 eviluess

I mean Lo = FL + (C - 0.707), + (SL - 0.707)

MarcusJohnson91 avatar Jul 03 '15 08:07 MarcusJohnson91

Confused, I think this might cause the DC offset become -1.414

eviluess avatar Jul 03 '15 08:07 eviluess

Subtraction is clearly the wrong way to go. You are supposed to multiply C by 0.707 in any case, since that compensates for cloning the (single) C channel into two channels. Most people also multiply the surrounds by that to avoid the surrounds interfering with the front sounds too much (ie. your original formula)

However, this formula can cause overflows if the original audio signal already is at full volume. The only way to reliably combat this is to reduce the overall volume to avoid clipping - this is what ffmpeg does. It will result in somewhat quieter track, but its the only way to absolutely be sure that it will never overflow/clip.

Nevcairiel avatar Jul 03 '15 08:07 Nevcairiel

The only way to reliably combat this is to reduce the overall volume to avoid clipping - this is what ffmpeg does.

Actually, libswresample and libavresample have different defaults. libswresample doesn't do it AFAIK, which will sometimes result in clipping. I guess it boils down to the user's preference: get audio that's "too silent", or audio that might clip.

ghost avatar Jul 03 '15 09:07 ghost

So there's no way to make a established downmix result with non-suppressed gain and without any clip?

eviluess avatar Jul 03 '15 09:07 eviluess

libswresample doesn't do it AFAIK, which will sometimes result in clipping.

swresample is a bit dumb. It does it when you use integer internal/output, but doesn't with float output, so yeah.

Nevcairiel avatar Jul 03 '15 09:07 Nevcairiel

So there's no way to make a established downmix result with non-suppressed gain and without any clip?

Not without analyzing the audio first to get its real peak information. A general 1-pass operation has to assume that all channels can contain full range audio at the same time, and if you combine 1.0 + 0.707 + 0.707 it will overflow, so you have to reduce volume, ie. effectively dividing by 2.414 (which is about 7.5dB in reduction, iirc)

Nevcairiel avatar Jul 03 '15 09:07 Nevcairiel

That's why I started the discussion.

I found some players request 5.1 channel configuration to the sound card via the waveOutOpen API by filling the corresponding Channel Flags (0x3F) to dwChannelMask, and will not lead to any overflow even I turn its volume and the system volume to the max together. The sound is louder than playing the downmix generated by ffmpeg (-7dB) directly in the player.

It seems that the sound card can do the downmix correctly? The 1-pass peak scanning couldn't have any chance to be processed.

eviluess avatar Jul 03 '15 11:07 eviluess

Hi there, sorry to revive this old discussion but I'm looking for someone able to implement correctly the "Novel 5.1 Downmix Algorithm with Improved Dialogue Intelligibility" research: ndmix

Similar to the state-of-the-art downmix methods, only 5 channels are taken into consideration: L, R, C, Ls and Rs. We can represent the downmix operation in the form of the following equation:

lt [n] = l[n] + 0.707 * c[n] + (dlev - 1) * e[n] + 0.5 * ls [n] rt [n] = r[n] + 0.707 * c[n] + (dlev - 1) * e[n] + 0.5 * rs [n]

where e[n] is the extracted voice signal, dlev represents the dialogue level and all considered signals are represented in the digital domain, in which n denotes the sample index.

Someone @ Hydrogenaudio forums implemented it in this way:

ffmpeg -i 6chan-input.wav -af "pan=stereo|FL < 1.0FL + 0.707FC + 0.707BL|FR < 1.0FR + 0.707FC + 0.707BR" -ac copy stereo.wav

...do you think is correct (and proper) ?

MarcoRavich avatar Nov 06 '20 10:11 MarcoRavich

@forart this project is very dead.

The decoder was moved to ffmpeg, talk to them.

MarcusJohnson91 avatar Nov 06 '20 10:11 MarcusJohnson91

DCADec hasn’t been updated in like 4-5 years, it’s been merged into FFmpeg.

In FFmpeg you’re looking to remap the channels, search that.

On Feb 1, 2022, at 8:44 PM, damian101 @.***> wrote:

 I just use the same command I use for 6.1 and 7.1 too: -af 'lowpass=c=LFE:f=120,pan=stereo|FL=.3FL+.21FC+.3FLC+.3SL+.3BL+.21BC+.21LFE|FR=.3FR+.21FC+.3FRC+.3SR+.3BR+.21BC+.21LFE' Maybe not ideal loudness-wise for 5.1, but I usually normalize to -23 LUFS anyway.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

MarcusJohnson91 avatar Feb 02 '22 02:02 MarcusJohnson91