mumble icon indicating copy to clipboard operation
mumble copied to clipboard

Crash in Audio-Wizard

Open savchenko opened this issue 3 years ago • 25 comments

Describe the bug Mumble crashes when opening Audio-Wizard and pressing "next".
Affects 1.4rc1 and later (tested on tagged 1.4rc1 and master at the time of writing)

Steps to Reproduce Steps to reproduce the behavior:

  1. From the main menu: Configure → Audio Wizard
  2. Press next
  3. Application crashes

The issue can be reproduced with either ALSA or PulseAudio.

ALSA

<W>2021-11-10 11:35:14.960 AudioInput: Opus encoder set for high quality speech
<W>2021-11-10 11:35:14.960 AudioInput: 40000 bits/s, 48000 hz, 480 sample
<W>2021-11-10 11:35:14.960 ALSAAudioOutput: Initialized
<W>2021-11-10 11:35:14.960 ALSAAudioInput: Initing audiocapture default.
<W>2021-11-10 11:35:14.961 ALSAAudioInput: Actual buffer 48000 hz, 1 channel 3840 samples [480 per period]
<W>2021-11-10 11:35:14.961 AudioInput: Initialized mixer for 1 channel 48000 hz mic and 0 channel 48000 hz echo
<W>2021-11-10 11:35:14.961 ALSAAudioInput: Releasing ALSA Mic.
<W>2021-11-10 11:35:14.962 ALSAAudioOutput: ALSA reports 32 output channels. Clamping to 2.
<W>2021-11-10 11:35:14.962 ALSAAudioOutput: Actual buffer 48000 hz, 2 channel 2881 samples [480 per period]
<W>2021-11-10 11:35:14.962 ALSAAudioOutput: Initializing 2 channel, 48000 hz mixer
<W>2021-11-10 11:35:14.962 AudioOutput: Initialized 2 channel 48000 hz mixer
<W>2021-11-10 11:35:14.963 ALSAAudioOutput: Destroyed
<W>2021-11-10 11:35:14.963 AudioInput: Opus encoder set for high quality speech
<W>2021-11-10 11:35:14.963 AudioInput: 40000 bits/s, 48000 hz, 480 sample
<W>2021-11-10 11:35:14.963 ALSAAudioOutput: Initialized
<W>2021-11-10 11:35:14.963 ALSAAudioInput: Initing audiocapture default.
<W>2021-11-10 11:35:14.963 ALSAAudioInput: Actual buffer 48000 hz, 1 channel 3840 samples [480 per period]
<W>2021-11-10 11:35:14.963 AudioInput: Initialized mixer for 1 channel 48000 hz mic and 0 channel 48000 hz echo
<W>2021-11-10 11:35:14.964 ALSAAudioOutput: ALSA reports 32 output channels. Clamping to 2.
<W>2021-11-10 11:35:14.964 ALSAAudioOutput: Actual buffer 48000 hz, 2 channel 2881 samples [480 per period]
<W>2021-11-10 11:35:14.966 ALSAAudioOutput: Initializing 2 channel, 48000 hz mixer
<W>2021-11-10 11:35:14.966 AudioOutput: Initialized 2 channel 48000 hz mixer
<W>2021-11-10 11:35:14.974 AudioInput: Noise canceller disabled
warning: The VAD has been replaced by a hack pending a complete rewrite
Fatal (internal) error in /home/user/code/mumble/3rdparty/opus/celt/celt_decoder.c, line 118: assertion failed: st->mode == opus_custom_mode_create(48000, 960, NULL)
Aborted (core dumped)

Pulse

<W>2021-11-10 11:34:40.925 AudioInput: Opus encoder set for high quality speech
<W>2021-11-10 11:34:40.925 AudioInput: 40000 bits/s, 48000 hz, 480 sample
<W>2021-11-10 11:34:40.925 PulseAudio: Starting input alsa_input.usb-Solid_State_Logic_SSL_2-00.analog-stereo
<W>2021-11-10 11:34:40.925 PulseAudio: Starting output: alsa_output.usb-Solid_State_Logic_SSL_2-00.analog-stereo
<W>2021-11-10 11:34:40.926 AudioOutput: Initialized 2 channel 48000 hz mixer
<W>2021-11-10 11:34:40.926 AudioInput: Initialized mixer for 1 channel 48000 hz mic and 0 channel 48000 hz echo
<W>2021-11-10 11:34:40.934 Updating application palette
<W>2021-11-10 11:34:40.936 AudioInput: Noise canceller disabled
warning: The VAD has been replaced by a hack pending a complete rewrite
<W>2021-11-10 11:34:45.569 AudioInput: Opus encoder set for high quality speech
<W>2021-11-10 11:34:45.569 AudioInput: 40000 bits/s, 48000 hz, 480 sample
<W>2021-11-10 11:34:45.569 AudioInput: Opus encoder set for high quality speech
<W>2021-11-10 11:34:45.569 AudioInput: 40000 bits/s, 48000 hz, 480 sample
<W>2021-11-10 11:34:45.571 AudioInput: Initialized mixer for 1 channel 48000 hz mic and 0 channel 48000 hz echo
<W>2021-11-10 11:34:45.576 AudioInput: Noise canceller disabled
warning: The VAD has been replaced by a hack pending a complete rewrite
<W>2021-11-10 11:34:46.577 AudioOutput: Initialized 2 channel 48000 hz mixer
Fatal (internal) error in /home/user/code/mumble/3rdparty/opus/celt/celt_decoder.c, line 118: assertion failed: st->mode == opus_custom_mode_create(48000, 960, NULL)
Aborted (core dumped)

Expected behavior No crash.

Screenshots N/A

Desktop (please complete the following information):

  • OS: Debian 11
  • Version: 1.4rc1++

Additional context

  • There is only one audio device connected to the system. In-build audio was specifically disabled.

  • Build flags:

    cmake
    -DRELEASE_ID=$(python "../scripts/mumble-version.py")
    -Dserver=OFF
    -Doverlay=OFF
    -Dcoreaudio=OFF
    -Ddbus=OFF
    -Dbundle-qt-translations=OFF
    -Dasio=OFF
    -Dgkey=OFF
    -Dg15=OFF
    -Djackaudio=OFF
    -Doss=OFF
    -Dpipewire=OFF
    -Dportaudio=OFF
    -Dsymbols=OFF
    -Dwasapi=OFF
    -Dxboxinput=OFF
    -Dxinput2=OFF
    -Dzeroconf=OFF
    -Dpulseaudio=ON
    -Dalsa=OFF
    -DCMAKE_BUILD_TYPE=Release
    

savchenko avatar Nov 10 '21 01:11 savchenko

Working set of parameters for PA-only build:

cmake
-DCMAKE_BUILD_TYPE=Release
-DOPUS_STACK_PROTECTOR=ON
-DOPUS_X86_PRESUME_AVX=ON
-DOPUS_X86_PRESUME_SSE4_1=ON
-DRELEASE_ID=$(python "../scripts/mumble-version.py")
-Dalsa=OFF
-Dasio=OFF
-Dbundle-qt-translations=OFF
-Dcoreaudio=OFF
-Dcrash-report=OFF
-Ddbus=OFF
-Dg15=OFF
-Dgkey=OFF
-Djackaudio=OFF
-Doptimize=ON
-Doss=OFF
-Doverlay=OFF
-Dpipewire=OFF
-Dplugins=OFF
-Dportaudio=OFF
-Dpulseaudio=ON
-Dserver=OFF
-Dspeechd=OFF
-Dstatic=OFF
-Dsymbols=OFF
-Dtranslations=OFF
-Dupdate=OFF
-Dwasapi=OFF
-Dxboxinput=OFF
-Dxinput2=OFF
-Dzeroconf=OFF ..

Still trying to narrow-down the root cause of https://github.com/mumble-voip/mumble/issues/5300.

savchenko avatar Nov 10 '21 06:11 savchenko

Why did you close this again? :thinking:

Krzmbrzl avatar Nov 10 '21 07:11 Krzmbrzl

FWIW I can't reproduce this on my machine. However, I did not explicitly disable any audio devices (I don't know how that would work). With these compile options, Mumble was set to use PulseAudio (with no option to switch to anything else)

Krzmbrzl avatar Nov 10 '21 09:11 Krzmbrzl

Why did you close this again? thinking

Cost/benefit. As noted in https://github.com/mumble-voip/mumble/issues/5302#issuecomment-964833310 the application works fine with ALSA removed from the equation.

I don't want to waste your time if this is something relevant solely to my combination of Linux distribution and/or hardware.

savchenko avatar Nov 10 '21 23:11 savchenko

This happened with me too. Using -Dalsa=OFF didn't help. The error points out this line in the opus repo.

Please don't close issues in haste when you got it working but the bug/problem has not been narrowed down.

Unrelated: Also, I can't see PipeWire in Interface options even though the flag is ON by default.

siddhpant avatar Nov 12 '21 17:11 siddhpant

@siddhpant do you happen to have a full backtrace that you could share with us?

Krzmbrzl avatar Nov 13 '21 08:11 Krzmbrzl

@siddhpant do you happen to have a full backtrace that you could share with us? ~ @Krzmbrzl

Built using these commands:
$ cmake \
    -DCMAKE_UNITY_BUILD=ON \
    -Dcrash-report=OFF \
    -Doptimize=ON \
    -Dplugins=OFF \
    -Dqtspeech=ON \
    -Dtranslations=OFF \
    -Dwasapi=OFF \
    -Dxboxinput=OFF \
    ..

$ make

$ make install

Then started Mumble and ran the audio wizard, which expectedly crashes again on clicking Next.

Terminal output:
<D>2021-11-13 14:53:37.265 libopus 1.3.1-97-g6b6035ae from /usr/local/lib/mumble/libopus.so.0
<W>2021-11-13 14:53:37.276 CELT bitstream 8000000b from /usr/local/lib/mumble/libcelt0.so.0.7.0
<D>2021-11-13 14:53:37.282 PulseAudio 15.0.0 from libpulse.so.0
ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:835:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
<D>2021-11-13 14:53:37.470 PortAudio V19.6.0-devel, revision 396fe4b6699ae929d3a685b3ef8a7e97396139a4 from libportaudio.so.2
<D>2021-11-13 14:53:37.471 JACK 1.9.19 from libjack.so.0
<W>2021-11-13 14:53:37.498 Theme: "Mumble"
<W>2021-11-13 14:53:37.498 Style: "Dark"
<W>2021-11-13 14:53:37.498 --> qss: ":themes/Default/Dark.qss"
<W>2021-11-13 14:53:37.499 Locale is "en_IN" (System: "en_IN")
<W>2021-11-13 14:53:37.499 Unable to find bundled translations for locale "en_IN"
<W>2021-11-13 14:53:37.677 Database SQLite: "3.36.0"
<W>2021-11-13 14:53:37.711 GlobalShortcutX: Using XI2 2.3
<W>2021-11-13 14:53:37.994 QMetaObject::connectSlotsByName: No matching signal for on_qtvUsers_customContextMenuRequested(QPoint,bool)
<W>2021-11-13 14:53:41.572 AudioInput: Opus encoder set for low delay
<W>2021-11-13 14:53:41.574 AudioInput: 192000 bits/s, 48000 hz, 480 sample
<W>2021-11-13 14:53:41.574 PulseAudio: Starting input alsa_input.pci-0000_00_1b.0.analog-stereo
<W>2021-11-13 14:53:41.574 PulseAudio: Starting echo: alsa_output.pci-0000_00_1b.0.analog-stereo.monitor
<W>2021-11-13 14:53:41.597 Public list disabled
<W>2021-11-13 14:53:41.636 Zeroconf: Third-party browser API reports error -65537
<W>2021-11-13 14:53:41.666 PulseAudio: Starting output: alsa_output.pci-0000_00_1b.0.analog-stereo
<W>2021-11-13 14:53:41.677 AudioInput: Initialized mixer for 1 channel 48000 hz mic and 0 channel 48000 hz echo
<W>2021-11-13 14:53:41.682 AudioInput: Using RNNoise as noise canceller
warning: The VAD has been replaced by a hack pending a complete rewrite
<W>2021-11-13 14:53:41.695 AudioInput: Initialized mixer for 1 channel 48000 hz mic and 1 channel 48000 hz echo
<W>2021-11-13 14:53:41.723 AudioOutput: Initialized 2 channel 48000 hz mixer
<W>2021-11-13 14:53:41.727 AudioInput: Using RNNoise as noise canceller
warning: The VAD has been replaced by a hack pending a complete rewrite
<W>2021-11-13 14:53:41.727 AudioInput: ECHO CANCELLER ACTIVE
<W>2021-11-13 14:53:42.100 Updating application palette
<W>2021-11-13 14:53:42.156 Updating application palette
<W>2021-11-13 14:53:45.421 AudioInput: Opus encoder set for low delay
<W>2021-11-13 14:53:45.421 AudioInput: 192000 bits/s, 48000 hz, 480 sample
<W>2021-11-13 14:53:45.426 AudioInput: Opus encoder set for low delay
<W>2021-11-13 14:53:45.426 AudioInput: 192000 bits/s, 48000 hz, 480 sample
<W>2021-11-13 14:53:45.427 AudioInput: Initialized mixer for 0 channel 48000 hz mic and 1 channel 48000 hz echo
<W>2021-11-13 14:53:45.427 AudioOutput: Initialized 2 channel 48000 hz mixer
<W>2021-11-13 14:53:45.431 AudioInput: Initialized mixer for 1 channel 48000 hz mic and 1 channel 48000 hz echo
<W>2021-11-13 14:53:45.454 AudioInput: Using RNNoise as noise canceller
warning: The VAD has been replaced by a hack pending a complete rewrite
<W>2021-11-13 14:53:45.454 AudioInput: ECHO CANCELLER ACTIVE
Fatal (internal) error in /tmp/mumble/3rdparty/opus/celt/celt_decoder.c, line 118: assertion failed: st->mode == opus_custom_mode_create(48000, 960, NULL)
Aborted
System details:
$ uname -a
Linux peesee 5.14.0-kali2-amd64 #1 SMP Debian 5.14.9-2kali1 (2021-10-04) x86_64 GNU/Linux

$ pactl info | grep "^Server Name"
Server Name: PulseAudio (on PipeWire 0.3.39)

siddhpant avatar Nov 13 '21 09:11 siddhpant

Surprisingly, when I build a deb package and then install, the client works and the Audio Wizard has ran successfully.

Commands used to build:
$ release_id=1.5.0~2021-11-11~ge54798cbd~snapshot
$ version=1.5.0.0

$ cmake \
    -DBUILD_NUMBER=0 \
    -DRELEASE_ID=$release_id \
    -DCMAKE_UNITY_BUILD=ON \
    -Dalsa=OFF \
    -Dcrash-report=OFF \
    -Doptimize=ON \
    -Dpackaging=ON \
    -Dplugins=OFF \
    -Dqtspeech=ON \
    -Dtranslations=OFF \
    -Dwasapi=OFF \
    -Dxboxinput=OFF \
    ..

$ sudo checkinstall \
    --install=yes \
    --pkgname=mumble \
    --pkgversion=$version \
    --pkgrelease=$release_id

siddhpant avatar Nov 13 '21 10:11 siddhpant

I get the same crash and the same output on: OpenSUSE Tumbleweed Linux 5.14.14-2-default

I'm on the master branch, latest commit.

powerjungle avatar Nov 14 '21 15:11 powerjungle

Can you try building the package in the way I mentioned above?

siddhpant avatar Nov 14 '21 16:11 siddhpant

For the reference, here is my build recipe:

git clone https://github.com/mumble-voip/mumble/

git submodule update --init --recursive

git pull --all --recurse-submodules

cd mumble && mkdir build && cd build

cmake \
-DCMAKE_BUILD_TYPE=Release \
-DOPUS_STACK_PROTECTOR=ON \
-DOPUS_X86_PRESUME_AVX=ON \
-DOPUS_X86_PRESUME_SSE4_1=ON \
-DRELEASE_ID=$(python "../scripts/mumble-version.py") \
-Dalsa=OFF \
-Dasio=OFF \
-Dbundle-qt-translations=OFF \
-Dcoreaudio=OFF \
-Dcrash-report=OFF \
-Ddbus=OFF \
-Dg15=OFF \
-Dgkey=OFF \
-Djackaudio=OFF \
-Doptimize=ON \
-Doss=OFF \
-Doverlay=OFF \
-Dpipewire=OFF \
-Dplugins=OFF \
-Dportaudio=OFF \
-Dpulseaudio=ON \
-Dserver=OFF \
-Dspeechd=OFF \
-Dstatic=OFF \
-Dsymbols=OFF \
-Dtranslations=OFF \
-Dupdate=OFF \
-Dwasapi=OFF \
-Dxboxinput=OFF \
-Dxinput2=OFF \
-Dzeroconf=OFF ..

cmake --build . -j 8

savchenko avatar Nov 16 '21 11:11 savchenko

This is the error I usually get after a crash:

Fatal (internal) error in /home/powerjungle/Desktop/mumble/3rdparty/opus/celt/celt_decoder.c, line 118: assertion failed: st->mode == opus_custom_mode_create(48000, 960, NULL)

I noticed this in the cmake logs.

-- The following features have been disabled:

 * OPUS_CUSTOM_MODES, enable non-Opus modes, e.g. 44.1 kHz & 2^n frames.

So I decided to add this to my cmake command:

-DOPUS_CUSTOM_MODES=ON

and it stopped crashing for me at least for now.

I'm not sure what this mode is for, because I don't know how opus works, but this is where I'm at now.

powerjungle avatar Dec 02 '21 18:12 powerjungle

I think it would be helpful to get an actual backfrace from a debugger like gdb.

Krzmbrzl avatar Dec 02 '21 20:12 Krzmbrzl

@powerjungle @savchenko have you been connected to a server when using the audio wizard (and seeing the crash)?

This might be a race condition between regular Mumble receiving audio (using the Opus context object) and the audio wizard restarting the audio system (destroys the context object).

Krzmbrzl avatar Dec 11 '21 19:12 Krzmbrzl

@powerjungle @savchenko have you been connected to a server when using the audio wizard (and seeing the crash)?

No even without being connected to a server, it just stops the moment audio has to be output to the speakers.

This might be a race condition between regular Mumble receiving audio (using the Opus context object) and the audio wizard restarting the audio system (destroys the context object).

I don't think it's a problem in the audio wizard only, it's every time audio is output anywhere.

powerjungle avatar Dec 11 '21 20:12 powerjungle

Ah okay - too bad

Krzmbrzl avatar Dec 11 '21 20:12 Krzmbrzl

https://github.com/mumble-voip/mumble/issues/5302#issuecomment-964833310

I disabled OPUS_X86_PRESUME_AVX, because my CPU doesn't support it, but it didn't help, same result.

I got a hint through IRC at Libera from the #opus channel that we might be dealing with memory corruption overwriting the mode. Although trying to run mumble using valgrind gives a segmentation fault. No idea why and how this can be fixed, I've tried running other apps, they don't fault with my version. I've tried different builds of Mumble as well running through it, same result. Runs fine on gdb, but I didn't get anywhere using it so far. When it crashes this is the backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff54b1323 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff5461366 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff544a7d3 in __GI_abort () at abort.c:79
#4  0x00007fffeaf3418e in celt_fatal
    (str=0x7fffeaf6d800 "assertion failed: st->mode == opus_custom_mode_create(48000, 960, NULL)", file=0x7fffeaf6d7b8 "/home/jungleman/Desktop/mumble/3rdparty/opus/celt/celt_decoder.c", line=118) at /home/jungleman/Desktop/mumble/3rdparty/opus/celt/arch.h:79
#5  0x00007fffeaf3f379 in validate_celt_decoder (st=0x7fffe40a1d10) at /home/jungleman/Desktop/mumble/3rdparty/opus/celt/celt_decoder.c:118
#6  0x00007fffeaf41a6a in celt_decode_with_ec
    (st=0x7fffe40a1d10, data=0x7fffe4063c7b "\377\376", len=219, pcm=0x7fffe40acdd0, frame_size=960, dec=0x7fffebffc100, accum=0)
    at /home/jungleman/Desktop/mumble/3rdparty/opus/celt/celt_decoder.c:873
#7  0x00007fffeaef99ef in opus_decode_frame (st=0x7fffe409fb20, data=0x7fffe4063c7b "\377\376", len=219, pcm=0x7fffe40acdd0, frame_size=960, decode_fec=0)
    at /home/jungleman/Desktop/mumble/3rdparty/opus/src/opus_decoder.c:518
#8  0x00007fffeaefa7b5 in opus_decode_native
    (st=0x7fffe409fb20, data=0x7fffe4063c7b "\377\376", len=442, pcm=0x7fffe40acdd0, frame_size=5760, decode_fec=0, self_delimited=0, packet_offset=0x0, soft_clip=0) at /home/jungleman/Desktop/mumble/3rdparty/opus/src/opus_decoder.c:721
#9  0x00007fffeaefaaf9 in opus_decode_float
    (st=0x7fffe409fb20, data=0x7fffe4063c78 "\373B\001\377\376", len=442, pcm=0x7fffe40acdd0, frame_size=5760, decode_fec=0)
    at /home/jungleman/Desktop/mumble/3rdparty/opus/src/opus_decoder.c:828
#10 0x00000000004fa6af in AudioOutputSpeech::prepareSampleBuffer(unsigned int) (this=0x7fffe4061b70, frameCount=570)
    at /home/jungleman/Desktop/mumble/src/mumble/AudioOutputSpeech.cpp:370
#11 0x00000000004f2e97 in AudioOutput::mix(void*, unsigned int) (this=0x13e6b40, outbuff=0x7fffebffd650, frameCount=570)
    at /home/jungleman/Desktop/mumble/src/mumble/AudioOutput.cpp:385
#12 0x0000000000718f49 in PulseAudioSystem::write_callback(pa_stream*, unsigned long, void*) (s=0x7fffe400c980, bytes=4560, userdata=0xd20000)
    at /home/jungleman/Desktop/mumble/src/mumble/PulseAudio.cpp:641
#13 0x00007ffff017586e in pa_command_request (pd=<optimized out>, command=<optimized out>, tag=<optimized out>, t=0x7fffe40761d0, userdata=0xd82520)
    at ../src/pulse/stream.c:847
#14 0x00007ffff010173c in pa_pdispatch_run
    (pd=0x7fffe40513a0, packet=packet@entry=0x7fffe4021fd0, ancil_data=ancil_data@entry=0x7fffe4005f58, userdata=userdata@entry=0xd82520)
    at ../src/pulsecore/pdispatch.c:349
#15 0x00007ffff015a7af in pstream_packet_callback (p=<optimized out>, packet=0x7fffe4021fd0, ancil_data=0x7fffe4005f58, userdata=0xd82520)
    at ../src/pulse/context.c:364
#16 0x00007ffff01065d7 in do_read (p=p@entry=0x7fffe4005cc0, re=re@entry=0x7fffe4005e88) at ../src/pulsecore/pstream.c:1023
#17 0x00007ffff0107318 in do_pstream_read_write (p=0x7fffe4005cc0) at ../src/pulsecore/pstream.c:254
#18 0x00007ffff01076d6 in srb_callback (srb=0x7fffe4021da0, userdata=0x7fffe4005cc0) at ../src/pulsecore/pstream.c:296
#19 0x00007ffff010abaa in srbchannel_rwloop (sr=0x7fffe4021da0) at ../src/pulsecore/srbchannel.c:190
#20 0x00007ffff016e013 in dispatch_pollfds (m=0xd54d20) at ../src/pulse/mainloop.c:676
#21 pa_mainloop_dispatch (m=m@entry=0xd54d20) at ../src/pulse/mainloop.c:917
#22 0x00007ffff016e666 in pa_mainloop_iterate (m=m@entry=0xd54d20, block=block@entry=1, retval=retval@entry=0x0) at ../src/pulse/mainloop.c:948
#23 0x00007ffff016e700 in pa_mainloop_run (m=0xd54d20, retval=0x0) at ../src/pulse/mainloop.c:963
#24 0x00007ffff017e5a9 in thread (userdata=0xd58e40) at ../src/pulse/thread-mainloop.c:101
#25 0x00007ffff011a53f in internal_thread_func (userdata=0xd574a0) at ../src/pulsecore/thread-posix.c:81
#26 0x00007ffff54af427 in start_thread (arg=<optimized out>) at pthread_create.c:435
#27 0x00007ffff5538810 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

powerjungle avatar Dec 26 '21 16:12 powerjungle

I'm not familiar with any of this, but is it possible we need to build opus/celt with whatever defines the "CUSTOM_MODES" thing?

Someone linked to the opus/celt source location, but here it is for reference:

#if defined(ENABLE_HARDENING) || defined(ENABLE_ASSERTIONS)
/* Make basic checks on the CELT state to ensure we don't end
   up writing all over memory. */
void validate_celt_decoder(CELTDecoder *st)
{
#ifndef CUSTOM_MODES
   celt_assert(st->mode == opus_custom_mode_create(48000, 960, NULL));
   celt_assert(st->overlap == 120);
   celt_assert(st->end <= 21);
#else
/* From Section 4.3 in the spec: "The normal CELT layer uses 21 of those bands,
   though Opus Custom (see Section 6.2) may use a different number of bands"

   Check if it's within the maximum number of Bark frequency bands instead */
   celt_assert(st->end <= 25);
#endif
   celt_assert(st->channels == 1 || st->channels == 2);
   celt_assert(st->stream_channels == 1 || st->stream_channels == 2);
   celt_assert(st->downsample > 0);
   celt_assert(st->start == 0 || st->start == 17);
   celt_assert(st->start < st->end);

jaggzh avatar Jan 22 '22 08:01 jaggzh

The main point here is that it shouldn't ever need these custom modes as we actually don't use them. My suspicion is that we are seeing the effect of some sort of memory/data corruption...

Krzmbrzl avatar Jan 23 '22 14:01 Krzmbrzl

After a discussion with @Krzmbrzl and @davidebeatrici we came to the following conclusions:

  • The crash is triggered when another user talks, not by sound simply playing.
  • This issue is most likely caused by a long-standing, dormant bug that was unearthed by commit a4fdc46a80d7da23ba02fa08c3361503b5cdc33c, which replaces a custom Opus repo with the upstream one.
  • It is not clear what the bug is as bisecting Opus is blocked by CMake support being added to Opus after the bug was introduced. It may be possible to backport the CMake patches and bisect past when they were added.
  • Mumble works fine on Ubuntu 20.04, but fails on Debian Testing and Arch. All of these have Opus 1.3.1 as a system library.
  • A working build can be produced despite encountering this bug by passing -Dbundled-opus=OFF to CMake, avoiding building Opus and instead using the system library. Alternatively, commit a4fdc46a80d7da23ba02fa08c3361503b5cdc33c can be reverted (git revert a4fdc46a) before building. Take care to clean the build directory before trying this though as Opus .so names differ and may cause issues.

diftucs avatar Feb 16 '22 22:02 diftucs

Valgrind output at the time of crash:

==7576== Thread 33:
==7576== Invalid read of size 16
==7576==    at 0x154575DC: memcpy (string_fortified.h:29)
==7576==    by 0x154575DC: silk_decode_core (3rdparty/opus/silk/decode_core.c:235)
==7576==    by 0x1545775D: silk_decode_frame (3rdparty/opus/silk/decode_frame.c:85)
==7576==    by 0x154592E5: silk_Decode (3rdparty/opus/silk/dec_API.c:300)
==7576==    by 0x154410A1: opus_decode_frame (3rdparty/opus/src/opus_decoder.c:402)
==7576==    by 0x15440871: opus_decode_native (3rdparty/opus/src/opus_decoder.c:721)
==7576==    by 0x154427A4: opus_decode_float (3rdparty/opus/src/opus_decoder.c:828)
==7576==    by 0x4DD68A: AudioOutputSpeech::prepareSampleBuffer(unsigned int) (src/mumble/AudioOutputSpeech.cpp:370)
==7576==    by 0x4D7F33: AudioOutput::mix(void*, unsigned int) (src/mumble/AudioOutput.cpp:385)
==7576==    by 0x68857C: PipeWireOutput::processCallback(void*) (src/mumble/PipeWire.cpp:410)
==7576==    by 0x13E8666A: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==    by 0x30C58AA6: ??? (in /usr/lib/x86_64-linux-gnu/spa-0.2/audioconvert/libspa-audioconvert.so)
==7576==    by 0x13E63580: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==  Address 0x31cf3d20 is on thread 33's stack
==7576==  3904 bytes below stack pointer
==7576== 
==7576== Invalid read of size 16
==7576==    at 0x154575E0: memcpy (string_fortified.h:29)
==7576==    by 0x154575E0: silk_decode_core (3rdparty/opus/silk/decode_core.c:235)
==7576==    by 0x1545775D: silk_decode_frame (3rdparty/opus/silk/decode_frame.c:85)
==7576==    by 0x154592E5: silk_Decode (3rdparty/opus/silk/dec_API.c:300)
==7576==    by 0x154410A1: opus_decode_frame (3rdparty/opus/src/opus_decoder.c:402)
==7576==    by 0x15440871: opus_decode_native (3rdparty/opus/src/opus_decoder.c:721)
==7576==    by 0x154427A4: opus_decode_float (3rdparty/opus/src/opus_decoder.c:828)
==7576==    by 0x4DD68A: AudioOutputSpeech::prepareSampleBuffer(unsigned int) (src/mumble/AudioOutputSpeech.cpp:370)
==7576==    by 0x4D7F33: AudioOutput::mix(void*, unsigned int) (src/mumble/AudioOutput.cpp:385)
==7576==    by 0x68857C: PipeWireOutput::processCallback(void*) (src/mumble/PipeWire.cpp:410)
==7576==    by 0x13E8666A: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==    by 0x30C58AA6: ??? (in /usr/lib/x86_64-linux-gnu/spa-0.2/audioconvert/libspa-audioconvert.so)
==7576==    by 0x13E63580: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==  Address 0x31cf3d30 is on thread 33's stack
==7576==  3888 bytes below stack pointer
==7576== 
==7576== Invalid read of size 16
==7576==    at 0x154575E5: memcpy (string_fortified.h:29)
==7576==    by 0x154575E5: silk_decode_core (3rdparty/opus/silk/decode_core.c:235)
==7576==    by 0x1545775D: silk_decode_frame (3rdparty/opus/silk/decode_frame.c:85)
==7576==    by 0x154592E5: silk_Decode (3rdparty/opus/silk/dec_API.c:300)
==7576==    by 0x154410A1: opus_decode_frame (3rdparty/opus/src/opus_decoder.c:402)
==7576==    by 0x15440871: opus_decode_native (3rdparty/opus/src/opus_decoder.c:721)
==7576==    by 0x154427A4: opus_decode_float (3rdparty/opus/src/opus_decoder.c:828)
==7576==    by 0x4DD68A: AudioOutputSpeech::prepareSampleBuffer(unsigned int) (src/mumble/AudioOutputSpeech.cpp:370)
==7576==    by 0x4D7F33: AudioOutput::mix(void*, unsigned int) (src/mumble/AudioOutput.cpp:385)
==7576==    by 0x68857C: PipeWireOutput::processCallback(void*) (src/mumble/PipeWire.cpp:410)
==7576==    by 0x13E8666A: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==    by 0x30C58AA6: ??? (in /usr/lib/x86_64-linux-gnu/spa-0.2/audioconvert/libspa-audioconvert.so)
==7576==    by 0x13E63580: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==  Address 0x31cf3d40 is on thread 33's stack
==7576==  3872 bytes below stack pointer
==7576== 
==7576== Invalid read of size 16
==7576==    at 0x154575EA: memcpy (string_fortified.h:29)
==7576==    by 0x154575EA: silk_decode_core (3rdparty/opus/silk/decode_core.c:235)
==7576==    by 0x1545775D: silk_decode_frame (3rdparty/opus/silk/decode_frame.c:85)
==7576==    by 0x154592E5: silk_Decode (3rdparty/opus/silk/dec_API.c:300)
==7576==    by 0x154410A1: opus_decode_frame (3rdparty/opus/src/opus_decoder.c:402)
==7576==    by 0x15440871: opus_decode_native (3rdparty/opus/src/opus_decoder.c:721)
==7576==    by 0x154427A4: opus_decode_float (3rdparty/opus/src/opus_decoder.c:828)
==7576==    by 0x4DD68A: AudioOutputSpeech::prepareSampleBuffer(unsigned int) (src/mumble/AudioOutputSpeech.cpp:370)
==7576==    by 0x4D7F33: AudioOutput::mix(void*, unsigned int) (src/mumble/AudioOutput.cpp:385)
==7576==    by 0x68857C: PipeWireOutput::processCallback(void*) (src/mumble/PipeWire.cpp:410)
==7576==    by 0x13E8666A: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==    by 0x30C58AA6: ??? (in /usr/lib/x86_64-linux-gnu/spa-0.2/audioconvert/libspa-audioconvert.so)
==7576==    by 0x13E63580: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576==  Address 0x31cf3d50 is on thread 33's stack
==7576==  3856 bytes below stack pointer
==7576== 
Fatal (internal) error in 3rdparty/opus/celt/celt_decoder.c, line 118: assertion failed: st->mode == opus_custom_mode_create(48000, 960, NULL)
==7576== 
==7576== Process terminating with default action of signal 6 (SIGABRT)
==7576==    at 0x7181891: raise (raise.c:50)
==7576==    by 0x716B535: abort (abort.c:79)
==7576==    by 0x15474C1B: celt_fatal (3rdparty/opus/celt/arch.h:79)
==7576==    by 0x1547E156: validate_celt_decoder (3rdparty/opus/celt/celt_decoder.c:118)
==7576==    by 0x1547E481: celt_decode_with_ec (3rdparty/opus/celt/celt_decoder.c:873)
==7576==    by 0x15441644: opus_decode_frame (3rdparty/opus/src/opus_decoder.c:518)
==7576==    by 0x15440871: opus_decode_native (3rdparty/opus/src/opus_decoder.c:721)
==7576==    by 0x154427A4: opus_decode_float (3rdparty/opus/src/opus_decoder.c:828)
==7576==    by 0x4DD68A: AudioOutputSpeech::prepareSampleBuffer(unsigned int) (src/mumble/AudioOutputSpeech.cpp:370)
==7576==    by 0x4D7F33: AudioOutput::mix(void*, unsigned int) (src/mumble/AudioOutput.cpp:385)
==7576==    by 0x68857C: PipeWireOutput::processCallback(void*) (src/mumble/PipeWire.cpp:410)
==7576==    by 0x13E8666A: ??? (in /usr/lib/x86_64-linux-gnu/libpipewire-0.3.so.0.345.0)
==7576== 
==7576== HEAP SUMMARY:
==7576==     in use at exit: 25,283,167 bytes in 132,219 blocks
==7576==   total heap usage: 884,069 allocs, 751,850 frees, 310,362,964 bytes allocated

davidebeatrici avatar Feb 17 '22 02:02 davidebeatrici

  1. Still fails with my Debian Stable system if Opus is included. How do we get Opus to fix their stuff (assuming we're not sending them invalid memory they eventually memcpy)? :)
  2. Building with opus disabled works. I'm currently using this cmake line: cmake .. -Doverlay-xcompile=OFF -Dbundled-opus=OFF

Valgrind output at the time of crash:

==7576== Thread 33:
==7576== Invalid read of size 16
==7576==    at 0x154575DC: memcpy (string_fortified.h:29)
==7576==    by 0x154575DC: silk_decode_core (3rdparty/opus/silk/decode_core.c:235)

...

jaggzh avatar Mar 25 '22 11:03 jaggzh

How do we get Opus to fix their stuff (assuming we're not sending them invalid memory they eventually memcpy)? :)

Well we'd have to figure out the exact cause of the issue. I still think it might be possible that IR is Mumble that somehow is causing a memory corruption :thinking:

Krzmbrzl avatar Mar 25 '22 13:03 Krzmbrzl

Looking at the commits that were made to upstream Opus since we updated our submodule to point to their repo, the following commits catch my eyes:

  • https://github.com/xiph/opus/commit/ec64b3c5b7abd621dfddee6b4cc115298e5d6803
  • https://github.com/xiph/opus/commit/12a356e431d1b2d3531d3d73de330bf9ee9be48b
  • https://github.com/xiph/opus/commit/ccb42e05cc6eb98a15874b9695361025b3ee17ab

If we're lucky, it is enough to update our Opus submodule to avoid these issues :eyes:

Krzmbrzl avatar Sep 12 '22 10:09 Krzmbrzl

Could some of those who could reproduce the crash with the bundled Opus version please check out #5879 (and execute git submodule update --init --recursive before building - which should note that it updated the Opus module) to see if incorporating these fixes is enough to fix the issue at hand?

Krzmbrzl avatar Sep 12 '22 10:09 Krzmbrzl

@Krzmbrzl Sorry for the late response! I built your tree using the branch "update-opus". Ran mumble, and tested the audio wizard using ALSA and PulseAudio. Everything seems to work fine. No errors, no crashes so far.

powerjungle avatar Sep 22 '22 13:09 powerjungle

Thanks :+1: In the meantime the "update" is actually a downgrade as Davide has already proven that the upgrade didn't fix the issue. It's good to hear that the downgrade fixes it, though.

Krzmbrzl avatar Sep 22 '22 15:09 Krzmbrzl

Yeah, glad to know the root cause (not found yet) appears to be the same for everyone.

davidebeatrici avatar Sep 22 '22 22:09 davidebeatrici

Hm.... I had slightly modified the now merged PR thinking that the code was still the same. However, now I can reliably reproduce this crash on the master branch. Could someone please re-test whether this issue exists on master (potentially after having updated the submodule) or not?

Krzmbrzl avatar Dec 26 '22 19:12 Krzmbrzl

So as it turns out that downgrade didn't fix this issue after all...

Krzmbrzl avatar Dec 27 '22 16:12 Krzmbrzl