Mp3 is quiet after normalizing
:warning: Please read this carefully and edit the example responses! If you do not fill out this information, your bug report may be closed without comment.
Checklist (please tick all boxes)
- [x] I am using the latest version of
ffmpeg-normalize(runpip3 install --upgrade ffmpeg-normalize) - [x] I am using the latest stable version of
ffmpegor a recent build from Git master
Expected behavior I'm normalizing a mp3 with -t -14 and -lrt 11.
Actual behavior The converted mp3 is quiet.
File to reproduce: https://return0.de/interview.mp3
Command The exact command you were trying to run:
ffmpeg-normalize -t -14 -lrt 11 interview.mp3 -c:a libmp3lame -b:a 128k -o output.mp3
Any output you get when running the command with the --debug flag:
DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-filters']
DEBUG: Parsing streams of interview.mp3
DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-i', 'interview.mp3', '-c', 'copy', '-t', '0', '-map', '0', '-f', 'null', 'NUL']
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfre
etype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-f
fnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb -
-enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --ena
ble-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
libavutil 57. 36.101 / 57. 36.101
libavcodec 59. 43.100 / 59. 43.100
libavformat 59. 31.100 / 59. 31.100
libavdevice 59. 8.101 / 59. 8.101
libavfilter 8. 48.100 / 8. 48.100
libswscale 6. 8.112 / 6. 8.112
libswresample 4. 9.100 / 4. 9.100
libpostproc 56. 7.100 / 56. 7.100
Input #0, mp3, from 'interview.mp3':
Metadata:
encoded_by : Switch Testversion © NCH Software
genre : Speech
title : <anonymized>
date : 2022
Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Output #0, null, to 'NUL':
Metadata:
encoded_by : Switch Testversion © NCH Software
genre : Speech
title : <anonymized>
date : 2022
encoder : Lavf59.31.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=N/A time=-577014:32:22.77 bitrate=N/A speed=N/A s/s speed=N/A
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Output file is empty, nothing was encoded
DEBUG: Found duration: 3052.098 s
DEBUG: Found audio stream at index 0
INFO: Normalizing file interview.mp3 (1 of 1)
DEBUG: Running normalization for interview.mp3
DEBUG: Parsing normalization info for interview.mp3
INFO: Running first pass loudnorm filter for stream 0
DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-nostdin', '-y', '-i', 'interview.mp3', '-filter_complex', '[0:0]loudnorm=i=-14.0:lra=11.0:tp=-2.0:offset=0.0:print_format=json', '-vn', '-sn', '-f', 'null', 'NUL']
DEBUG: ffmpeg output: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreet
ype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffn
vcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb --e
nable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enabl
e-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
libavutil 57. 36.101 / 57. 36.101
libavcodec 59. 43.100 / 59. 43.100
libavformat 59. 31.100 / 59. 31.100
libavdevice 59. 8.101 / 59. 8.101
libavfilter 8. 48.100 / 8. 48.100
libswscale 6. 8.112 / 6. 8.112
libswresample 4. 9.100 / 4. 9.100
libpostproc 56. 7.100 / 56. 7.100
Input #0, mp3, from 'interview.mp3':
Metadata:
encoded_by : Switch Testversion © NCH Software
genre : Speech
title : <anonymized>
date : 2022
Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 (mp3float) -> loudnorm:default
loudnorm:default -> Stream #0:0 (pcm_s16le)
Output #0, null, to 'NUL':
Metadata:
encoded_by : Switch Testversion © NCH Software
genre : Speech
title : <anonymized>
date : 2022
encoder : Lavf59.31.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, mono, s16, 3072 kb/s
Metadata:
encoder : Lavc59.43.100 pcm_s16le
video:0kB audio:1144868kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 000001f4daa97e40]
{
"input_i" : "-0.08",
"input_tp" : "84.92",
"input_lra" : "4.20",
"input_thresh" : "-19.69",
"output_i" : "-23.10",
"output_tp" : "-2.00",
"output_lra" : "7.20",
"output_thresh" : "-33.77",
"normalization_type" : "dynamic",
"target_offset" : "9.10"
}
DEBUG: Loudnorm first pass command output: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreet
ype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffn
vcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb --e
nable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enabl
e-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
libavutil 57. 36.101 / 57. 36.101
libavcodec 59. 43.100 / 59. 43.100
libavformat 59. 31.100 / 59. 31.100
libavdevice 59. 8.101 / 59. 8.101
libavfilter 8. 48.100 / 8. 48.100
libswscale 6. 8.112 / 6. 8.112
libswresample 4. 9.100 / 4. 9.100
libpostproc 56. 7.100 / 56. 7.100
Input #0, mp3, from 'interview.mp3':
Metadata:
encoded_by : Switch Testversion © NCH Software
genre : Speech
title : <anonymized>
date : 2022
Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 (mp3float) -> loudnorm:default
loudnorm:default -> Stream #0:0 (pcm_s16le)
Output #0, null, to 'NUL':
Metadata:
encoded_by : Switch Testversion © NCH Software
genre : Speech
title : <anonymized>
date : 2022
encoder : Lavf59.31.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, mono, s16, 3072 kb/s
Metadata:
encoder : Lavc59.43.100 pcm_s16le
video:0kB audio:1144868kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 000001f4daa97e40]
{
"input_i" : "-0.08",
"input_tp" : "84.92",
"input_lra" : "4.20",
"input_thresh" : "-19.69",
"output_i" : "-23.10",
"output_tp" : "-2.00",
"output_lra" : "7.20",
"output_thresh" : "-33.77",
"normalization_type" : "dynamic",
"target_offset" : "9.10"
}
DEBUG: Loudnorm stats parsed: {"input_i": "-0.08", "input_tp": "84.92", "input_lra": "4.20", "input_thresh": "-19.69", "output_i": "-23.10", "output_tp": "-2.00", "output_lra": "7.20", "output_thresh": "-33.77", "normalization_type": "dynamic", "target_offset": "9.10"}
INFO: Running second pass for interview.mp3
DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-y', '-nostdin', '-i', 'interview.mp3', '-filter_complex', '[0:0]loudnorm=i=-14.0:lra=11.0:tp=-2.0:offset=9.1:measured_i=-0.08:measured_lra=4.2:measured_tp=84.92:measured_thresh=-19.69:li
near=true:print_format=json[norm0]', '-map_metadata', '0', '-map_metadata:s:a:0', '0:s:a:0', '-map_chapters', '0', '-map', '[norm0]', '-c:a', 'libmp3lame', '-b:a', '128k', '-c:s', 'copy', 'C:\\Users\\David\\AppData\\Local\\Temp\\39gk8km0.mp3']
DEBUG: ffmpeg output: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreet
ype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffn
vcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb --e
nable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enabl
e-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
libavutil 57. 36.101 / 57. 36.101
libavcodec 59. 43.100 / 59. 43.100
libavformat 59. 31.100 / 59. 31.100
libavdevice 59. 8.101 / 59. 8.101
libavfilter 8. 48.100 / 8. 48.100
libswscale 6. 8.112 / 6. 8.112
libswresample 4. 9.100 / 4. 9.100
libpostproc 56. 7.100 / 56. 7.100
Input #0, mp3, from 'interview.mp3':
Metadata:
encoded_by : Switch Testversion © NCH Software
genre : Speech
title : <anonymized>
date : 2022
Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 (mp3float) -> loudnorm:default
loudnorm:default -> Stream #0:0 (libmp3lame)
Output #0, mp3, to 'C:\Users\David\AppData\Local\Temp\39gk8km0.mp3':
Metadata:
TENC : Switch Testversion © NCH Software
TCON : Speech
TIT2 : <anonymized>
TDRC : 2022
TSSE : Lavf59.31.100
Stream #0:0: Audio: mp3, 48000 Hz, mono, fltp, 128 kb/s
Metadata:
encoder : Lavc59.43.100 libmp3lame
video:0kB audio:47703kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001220%
[Parsed_loudnorm_0 @ 000002700d18b740]
{
"input_i" : "-0.08",
"input_tp" : "84.92",
"input_lra" : "4.20",
"input_thresh" : "-19.69",
"output_i" : "-16.20",
"output_tp" : "-2.00",
"output_lra" : "5.30",
"output_thresh" : "-29.19",
"normalization_type" : "dynamic",
"target_offset" : "2.20"
}
DEBUG: Moving temporary file from C:\Users\David\AppData\Local\Temp\39gk8km0.mp3 to output.mp3
DEBUG: Normalization finished
INFO: Normalized file written to output.mp3
Environment (please complete the following information):
- [x] Your operating system: Windows 10
- [x] Your Python version / distribution (
python3 --versionorpython --version): 3.10.6 - [x] Your ffmpeg version (
ffmpeg -version): N-108116-g50a4dff69f-20220913
Hm. Just to make sure I didn't just break this. Could you please check if the previous release produces the same problem?
If yes, does it only apply to all files or just this one?
The old release has the same problem. I know this problem at the moment only at this file.
In that case it might again be an issue with the original filter, which I cannot do much about. I know it has its quirks and should be improved (unfortunately it has not been maintained much recently).
I will check it out tomorrow!
Thank you. I do batch processing. So if you can fix it, it would be perfect. If you can not fix this problem, it would be great if the programm can send an error. This would help for batch processing, to identify a problem.
Something is wrong with the entire conversion. Look at the original:

vs the output:

Essentially, the original you had was mixed very loud to begin with. The resulting file is missing a large chunk of the audio contents.
I think you should file a bug report on https://trac.ffmpeg.org/ for this particular sample.
Unforatunately, there is not much I can do about it!
Is it possible to "validate" the output mp3? Sth. like: if ( x% of the output mp3 is quiet) -> throw an error
This would be possible with the silencedetect filter of ffmpeg and parsing the output in the shell, but really, it's obviously just a bug that needs to be fixed. I don't see that it's worth implementing such a check in this tool.
If you are looking for a script that implements "how many percent of a clip are silent?", and you have something started already, superuser.com may be a good resource for help.
This is more input mp3 problem, somehow there is very huge amplitude spike 84.9dBFS right at start of audio. Can be inspected with astats or ebur128 filter. loudnorm gets somehow entirely confused about this and thus dynamics processing/scanning is giving wrong results to compensate for such huge spike. Can be fixed by using some kind of limiter just as first step in processing chain.
@richardpl Thanks for your input! Is this just this MP3 or have you seen it with others?
I seen it first time with this mp3, but can be reproduced with any file that can store >+/-1.0 values in samples. Just need high enough single sample spike.
I guess I will close this as a one-off bug then, nothing that can be done from ffmpeg-normalize.