generated wav is mapping 1channel to the FL speaker instead of mono
I have observed that when playing back the waveform in certain audio players, the sound is routed only to the left speaker. This issue seems to occur when the player relies on the speaker channel mapping embedded in the audio file. Despite the audio file being mono, the playback is incorrectly mapped to a single speaker (left) rather than both speakers, resulting in no sound from the right speaker.
ffprobe .\musicgpt-generated.wav
ffprobe version 7.1-full_build-www.gyan.dev Copyright (c) 2007-2024 the FFmpeg developers
built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.100 / 61. 19.100
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
Input #0, wav, from '.\musicgpt-generated.wav':
Duration: 00:00:09.94, bitrate: 1024 kb/s
Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 32000 Hz, 1 channels (FL), flt, 1024 kb/s
diff --git "a/.\\ffprobe-fl.txt" "b/.\\ffprobe-mono.txt"
index 3bdcb7a..a191f71 100644
--- "a/.\\ffprobe-fl.txt"
+++ "b/.\\ffprobe-mono.txt"
@@ -1,4 +1,4 @@
-ffprobe .\musicgpt-generated.wav
+ffprobe .\output_mono.wav
ffprobe version 7.1-full_build-www.gyan.dev Copyright (c) 2007-2024 the FFmpeg developers
built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
@@ -10,6 +10,8 @@ ffprobe version 7.1-full_build-www.gyan.dev Copyright (c) 2007-2024 the FFmpeg d
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
-Input #0, wav, from '.\musicgpt-generated.wav':
+Input #0, wav, from '.\output_mono.wav':
+ Metadata:
+ encoder : Lavf61.7.100
Duration: 00:00:09.94, bitrate: 1024 kb/s
- Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 32000 Hz, 1 channels (FL), flt, 1024 kb/s
\ No newline at end of file
+ Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 32000 Hz, 1 channels, flt, 1024 kb/s
\ No newline at end of file
It seems the problem came from hound and the assumption of mapping by default the number of channels to a certain speaker I will try to create the issue in the hound repository
https://github.com/ruuda/hound/blob/b5b6fbdd4ca29daa2cc3f8de7c3c57814c0f3207/src/write.rs#L124-L149
At least until they fixed and release a new version I think this issue should be keep opened
👍 makes sense, thanks for reporting this!
@gabotechs glad to see you are still active. Thank you for the project and still hoping for that time extension some day to get it over the finish line, so to speak :)
https://github.com/ruuda/hound/pull/88
@gabotechs glad to see you are still active. Thank you for the project and still hoping for that time extension some day to get it over the finish line, so to speak :)
Before implementing such functionality we need a way to handle the onnx export with optimum for audio input models like musicgen-melody, without that we can not create long duration songs, so I opened a issue https://github.com/huggingface/optimum/issues/2095
Fixed!
https://github.com/gabotechs/MusicGPT/pull/24