audioread
audioread copied to clipboard
Incorrect number of channels with ffmpeg
Version: 2.1.9
It is possible to fool audioread into determining there are 0 channels in the audio file when it does actually have an audio channel.
This occurs when metadata in the file contains the string "audio:"
Test case:
$ ffmpeg -i test/data/test-2.mp3 -metadata description="audio: broken" out.mp3
$ python -c 'import audioread; print(audioread.audio_open("out.mp3", backends=[audioread.ffdec.FFmpegAudioFile]).channels)'
0
audioread assumes the first line on stderr containing "audio:" is ffmpeg outputting stream information https://github.com/beetbox/audioread/blob/5afc8a6dcb8ab801d19d67dc77fe8824ad04acb5/audioread/ffdec.py#L231
As seen in the following output, the description containing "audio: broken" occurs before "Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s"
$ ffmpeg -i out.mp3 -f s16le - > /dev/null
ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mp3, from 'out.mp3':
Metadata:
description : audio: broken
encoder : Lavf58.29.100
Duration: 00:00:02.04, start: 0.025057, bitrate: 129 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc58.54
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'pipe:':
Metadata:
description : audio: broken
encoder : Lavf58.29.100
Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Metadata:
encoder : Lavc58.54.100 pcm_s16le
size= 345kB time=00:00:02.00 bitrate=1411.2kbits/s speed= 410x
Wow; thanks for the detailed report! This looks like a tricky edge case; we can certainly make our parsing more robust.
One simple option would be to also require the line (after stripping) to start with Stream
. I'm not familiar enough with FFmpeg's output format to be certain that this is always how the output looks, but it would certainly rule out erroneous confusion with the description
field.