mumble
mumble copied to clipboard
Short audio interruption when resyncing crypt-nonce
Issue description
We're currently experiencing regular (every few minutes) short audio interruptions (~5-10 secs, bidirectional) for certain users (AFAICT those with more recent client versions). I noticed that every time this happened, the server logged the following line (with the affected user's name in it):
<W>2021-05-12 01:26:37.132 1 => <86:s3lph(1)> Requested crypt-nonce resync
Expected behavior
The crypt nonce resync (I don't have enough insight into Mumble's crypto to understand what nonce that is) should complete without audio interruption.
Actual behavior
There is a short audio interruption for the affected user during crypt-nonce resync.
Steps to reproduce
- Use the latest Mumble client
- Hold a conversation with someone else
- Monitor the log while talking
- Note that there is a short audio interruption when the server logs about "crypt-nonce-resync"
Environment
- Client: mumble master (13db7bded) on Arch Linux
- Server: murmur 1.3.0 (Debian package) on Debian buster
Hm that's interesting. Afaik the need for a crypt-nonce resync should not occur that often. @davidebeatrici any idea what might be the issue?
Issue hasn't reappeared after upgrading the Server to murmur 1.3.4. As far as I'm concerned, this issue could now be considered obsolete.
That is great to hear. Then I'll close this issue and let's hope for the best that it stays away :)
@az33zy what version was the affected client using? @s3lph what was the client version in your case?
@s3lph what was the client version in your case?
I was using the then-latest commit on master, installed via https://aur.archlinux.org/packages/mumble-git/, other affected users were using 1.3.3 or 1.3.4 clients.
@davidebeatrici do you think that this could be related to the odd resync logic (the one with this arbitrary timer) that we talked about not too long ago and that we wanted to replace with a logic that checks the amount of undecryptable packages instead?
Entirely possible, yes.
We could adopt the solution that is used in SoftEther VPN's WireGuard implementation: https://github.com/SoftEtherVPN/SoftEtherVPN/blob/master/src/Cedar/Proto_WireGuard.c https://github.com/SoftEtherVPN/SoftEtherVPN/blob/master/src/Cedar/Proto_WireGuard.h
The relevant code is in WgsProcessTransportData()
.
In order not to lose any packets, the previous keypair is used, as long as it's not expired.
We could adopt the solution that is used in SoftEther VPN's WireGuard implementation:
Could you put that to words as well? The code itself doesn't seem to tell me the bigger picture of what is going on :thinking:
There are 3 concurrent keypairs maximum: previous
, current
, next
.
In WireGuard's case the identifier for each keypair is a 32 bit index.
-
Session established:
current
is used to encrypt/decrypt packets.previous
andnext
are NULL. -
Some time passes:
next
becomes available. -
When decrypting a received packet, we identify the correct keypair to use:
-
current
: No changes. -
previous
: Used instead ofcurrent
. The packet is out-of-order because encrypted before packets received earlier. -
next
:current
becomesprevious
,next
becomescurrent
andnext
becomes NULL.
We now check whether the keypair is expired (i.e. it was used too many times). If it is, we discard the packet.
-
Step 2 is performed regularly in a healthy session.
Yeah that sounds like a sound concept to use :+1:
I am running into this issue. I noticed that it happens more often, the more users are connected. With only 2 users it happens rarely to never, but as soon as 3 user are connected, it happens regularly, at least every 15 Minutes. With 4 and more users it happens at least every 5 minutes.
Is there anything I can do or provide to help get this issue fixed?
Here is some information about the server and the clients: Server:
- Alpine Linux 3.19/edge , murmurd 1.4.287
Clients:
- Arch Linux mumble 1.5.517
- Windows mumble 1.3.3
- Windows mumble 1.4.287
- Debian mumble 1.3.4
Is there anything I can do or provide to help get this issue fixed?
Short of providing a PR with a fix, I don't think there currently is anything 🤔