mumble icon indicating copy to clipboard operation
mumble copied to clipboard

Weird audio issues with high audio per packet settings

Open bkacjios opened this issue 9 months ago • 9 comments

Description

Hello, I am the author of lua-mumble, a module for allowing someone to create a bot via the Lua scripting language.

I've been updating it to support the new protobuf UDP packets that were introduced in 1.5 and have been running into this weird issue. I have a test bot that is configured with a 60 ms audio frame per packet with a 96000 bitrate. It is also configured to output stereo audio. The server has a max bitrate of 558000.

I noticed that sometimes I was having weird audio issues with the bot I couldn't explain. Sometimes I would start the bot and the audio would be stuttering from the start, other times I would start it and it would sound perfect. The only way I was able to reproduce it consistently is by setting the audio packet size to 60 on the bot and muting/unmuting my client.

https://github.com/mumble-voip/mumble/assets/3247233/d4b7a660-76a1-4da9-999c-9128223647fa

I even managed to get this to happen when having my bot not play any music at all. Instead I had it loop back my audio so I could hear myself through the bot. The stuttering issues were also present. It almost seems like my clients decoder is trying to decode it in some other mode or something, but honestly I'm not too sure what is going on here.

Would anyone perhaps know what is going on here? I'm still unsure if this is an issue with the way I am encoding audio or an issue with how mumble is decoding audio.

Steps to reproduce

Have a client start transmitting audio at 60ms audio frame packet. Mute and unmute the audio. They will now be stuttering.

Mumble version

1.5.628

Mumble component

Client

OS

Windows

Reproducible?

Yes

Additional information

No response

Relevant log output

No response

Screenshots

No response

bkacjios avatar May 10 '24 14:05 bkacjios

I have encountered a similar issue when working on a project based on libmumble.

Turns out the Mumble client struggles with packets that contain more than 10ms (480 samples at 48000 Hz) of audio.

Try to send 480 samples (per channel): the issue should not be present.

davidebeatrici avatar May 10 '24 20:05 davidebeatrici

Good to know. I was kinda going crazy thinking I was doing something wrong somewhere. Yeah, setting it to 10 ms seems to work fine. 20 is okay too mostly, but 40 and 60 are basically unusable. I usually like to use at least 20 ms since I find it seems to keep the audio clarity stable. I notice little hiccups or stutters with 10 that are nowhere near as bad as this though.

bkacjios avatar May 10 '24 22:05 bkacjios

Thank you for confirming.

This is definitely something that has to be fixed in the client. Ideally it should handle all frame sizes that are supported by libopus.

https://github.com/mumble-voip/mumble/blob/dfc8dacae3f37ecd522d73fe7db9a0a91060e0ae/src/mumble/Audio.h#L22 https://github.com/mumble-voip/mumble/blob/dfc8dacae3f37ecd522d73fe7db9a0a91060e0ae/src/mumble/AudioInput.h#L219-L224

The iSampleRate and iFrameSize variables exist because the code itself doesn't consider the sample rate and frame size to be hardcoded.

Unfortunately an option to change them was never added and as a result problems like this one went unnoticed until third-party clients (e.g. bots) started popping up.

davidebeatrici avatar May 11 '24 01:05 davidebeatrici

So wait, I'm a little confused. I thought this setting in the client adjusts it? Are you saying the client just thinks all incoming audio is 10 ms?

image

I made my bot mimic this functionality. I was defaulting it to 20 ms like mumble does on a fresh install, but I can adjust it via a method call.

https://github.com/bkacjios/lua-mumble/blob/54c2272e2f0ba06bacf87c2fdc21441d14505140/mumble/mumble.h#L81 https://github.com/bkacjios/lua-mumble/blob/54c2272e2f0ba06bacf87c2fdc21441d14505140/mumble/audio.c#L196

bkacjios avatar May 11 '24 02:05 bkacjios

The setting itself adjusts the number of 10ms chunks to send per packet, not the actual number of audio frames:

https://github.com/mumble-voip/mumble/blob/dfc8dacae3f37ecd522d73fe7db9a0a91060e0ae/src/mumble/AudioInput.h#L235-L239 https://github.com/mumble-voip/mumble/blob/dfc8dacae3f37ecd522d73fe7db9a0a91060e0ae/src/mumble/AudioInput.cpp#L694-L728 https://github.com/mumble-voip/mumble/blob/dfc8dacae3f37ecd522d73fe7db9a0a91060e0ae/src/mumble/AudioInput.cpp#L1105-L1140

Another issue in the code above is that it doesn't clamp the number of chunks to match iAudioFrame's value.

Putting that aside, let's see what is going on in the audio output section:

https://github.com/mumble-voip/mumble/blob/dfc8dacae3f37ecd522d73fe7db9a0a91060e0ae/src/mumble/AudioOutputSpeech.cpp#L58-L153

As you can see, the code assumes that incoming frames are always 10ms. This is wrong because (at least in our case) the Opus encoder always produces packets that contain a single encoded frame.

Basically, there is no concept of "chunks" in encoded packets, regardless of the client's audio input settings.

Finally, just to add some more confusion to the mix and possibly clarifying it: xiph/opus#315

davidebeatrici avatar May 11 '24 20:05 davidebeatrici

Ahhh.. I understand now. My bot has a timer that determines how often it should send the audio data. So if I set the bots audio packet size to 60, it's encoding 2880 frames into one audio packet every 60 ms. I'm guessing I should send 6 individual packets with 480 encoded frames in one go?

bkacjios avatar May 11 '24 21:05 bkacjios

Yup!

davidebeatrici avatar May 11 '24 22:05 davidebeatrici

Am I missing something?

When my bot receives audio from me speaking, it isn't getting 10 ms chunks if I adjust my client to have a 60 ms delay. Doesn't this go against what you said? Shouldn't I still be receiving 6 individual 240 bytes every 0.06 seconds?

OnUserStartSpeaking     mumble.user [137]["Bkacjios"]
received 960 bytes in 0.059990 seconds
[MUMBLE - TRACE] RECEIVED MumbleUDP.Audio
received 960 bytes in 0.059892 seconds
[MUMBLE - TRACE] RECEIVED MumbleUDP.Audio
received 960 bytes in 0.060152 seconds
OnUserStopSpeaking      mumble.user [137]["Bkacjios"]

Compared to the results when I have it set to 10 ms.

[MUMBLE - TRACE] RECEIVED MumbleUDP.Audio
OnUserStartSpeaking     mumble.user [137]["Bkacjios"]
received 240 bytes in 0.009980 seconds
[MUMBLE - TRACE] RECEIVED MumbleUDP.Audio
received 240 bytes in 0.009977 seconds
[MUMBLE - TRACE] RECEIVED MumbleUDP.Audio
received 240 bytes in 0.009891 seconds
OnUserStopSpeaking      mumble.user [137]["Bkacjios"]

bkacjios avatar May 12 '24 17:05 bkacjios

Are you referring to the size of the encoded packet or the raw audio data?

davidebeatrici avatar May 12 '24 20:05 davidebeatrici

Yeah, this was the encoded packet I receive from a speaking client.

I'm starting to think we had a misunderstanding here.

Turns out the Mumble client struggles with packets that contain more than 10ms (480 samples at 48000 Hz) of audio.

Try to send 480 samples (per channel): the issue should not be present.

My bot resamples all playing audio to 48000hz, since that's what mumble expects. I always encode 480 samples per 10ms. The issue I am having is with sending more than 10 ms per packet. (Replicating the audio per packet setting in an official client)

If I set it to 20, I will encode (480 * 2) bytes of PCM data and send that over. Is that not correct?

bkacjios avatar May 17 '24 13:05 bkacjios

If I set it to 20, I will encode (480 * 2) bytes of PCM data and send that over. Is that not correct?

With libopus you can encode either 16 bit (2 bytes) signed integer or 32 bit (4 bytes) float samples. You can choose by calling either opus_encode() or opus_encode_float(), respectively.

20ms (0.02s) of mono 16 bit audio data at 48000Hz would be:

(20 / 1000) * (16 / 8) * 48000 = 1920 bytes

Multiply the result by the number of channels (3840 bytes for stereo).

davidebeatrici avatar May 17 '24 20:05 davidebeatrici