mumble icon indicating copy to clipboard operation
mumble copied to clipboard

Short audio interruption when resyncing crypt-nonce

Open s3lph opened this issue 3 years ago • 12 comments

Issue description

We're currently experiencing regular (every few minutes) short audio interruptions (~5-10 secs, bidirectional) for certain users (AFAICT those with more recent client versions). I noticed that every time this happened, the server logged the following line (with the affected user's name in it):

<W>2021-05-12 01:26:37.132 1 => <86:s3lph(1)> Requested crypt-nonce resync

Expected behavior

The crypt nonce resync (I don't have enough insight into Mumble's crypto to understand what nonce that is) should complete without audio interruption.

Actual behavior

There is a short audio interruption for the affected user during crypt-nonce resync.

Steps to reproduce

  1. Use the latest Mumble client
  2. Hold a conversation with someone else
  3. Monitor the log while talking
  4. Note that there is a short audio interruption when the server logs about "crypt-nonce-resync"

Environment

  • Client: mumble master (13db7bded) on Arch Linux
  • Server: murmur 1.3.0 (Debian package) on Debian buster

s3lph avatar May 12 '21 00:05 s3lph

Hm that's interesting. Afaik the need for a crypt-nonce resync should not occur that often. @davidebeatrici any idea what might be the issue?

Krzmbrzl avatar May 12 '21 07:05 Krzmbrzl

Issue hasn't reappeared after upgrading the Server to murmur 1.3.4. As far as I'm concerned, this issue could now be considered obsolete.

s3lph avatar Jun 07 '21 15:06 s3lph

That is great to hear. Then I'll close this issue and let's hope for the best that it stays away :)

Krzmbrzl avatar Jun 07 '21 16:06 Krzmbrzl

@az33zy what version was the affected client using? @s3lph what was the client version in your case?

Krzmbrzl avatar Jul 31 '21 05:07 Krzmbrzl

@s3lph what was the client version in your case?

I was using the then-latest commit on master, installed via https://aur.archlinux.org/packages/mumble-git/, other affected users were using 1.3.3 or 1.3.4 clients.

s3lph avatar Aug 01 '21 00:08 s3lph

@davidebeatrici do you think that this could be related to the odd resync logic (the one with this arbitrary timer) that we talked about not too long ago and that we wanted to replace with a logic that checks the amount of undecryptable packages instead?

Krzmbrzl avatar Aug 01 '21 05:08 Krzmbrzl

Entirely possible, yes.

We could adopt the solution that is used in SoftEther VPN's WireGuard implementation: https://github.com/SoftEtherVPN/SoftEtherVPN/blob/master/src/Cedar/Proto_WireGuard.c https://github.com/SoftEtherVPN/SoftEtherVPN/blob/master/src/Cedar/Proto_WireGuard.h

The relevant code is in WgsProcessTransportData(). In order not to lose any packets, the previous keypair is used, as long as it's not expired.

davidebeatrici avatar Aug 01 '21 10:08 davidebeatrici

We could adopt the solution that is used in SoftEther VPN's WireGuard implementation:

Could you put that to words as well? The code itself doesn't seem to tell me the bigger picture of what is going on :thinking:

Krzmbrzl avatar Aug 01 '21 12:08 Krzmbrzl

There are 3 concurrent keypairs maximum: previous, current, next.

In WireGuard's case the identifier for each keypair is a 32 bit index.

  1. Session established: current is used to encrypt/decrypt packets. previous and next are NULL.

  2. Some time passes: next becomes available.

  3. When decrypting a received packet, we identify the correct keypair to use:

    • current: No changes.
    • previous: Used instead of current. The packet is out-of-order because encrypted before packets received earlier.
    • next: current becomes previous, next becomes current and next becomes NULL.

    We now check whether the keypair is expired (i.e. it was used too many times). If it is, we discard the packet.

Step 2 is performed regularly in a healthy session.

davidebeatrici avatar Aug 01 '21 15:08 davidebeatrici

Yeah that sounds like a sound concept to use :+1:

Krzmbrzl avatar Aug 01 '21 15:08 Krzmbrzl

I am running into this issue. I noticed that it happens more often, the more users are connected. With only 2 users it happens rarely to never, but as soon as 3 user are connected, it happens regularly, at least every 15 Minutes. With 4 and more users it happens at least every 5 minutes.

Is there anything I can do or provide to help get this issue fixed?

Here is some information about the server and the clients: Server:

  • Alpine Linux 3.19/edge , murmurd 1.4.287

Clients:

  • Arch Linux mumble 1.5.517
  • Windows mumble 1.3.3
  • Windows mumble 1.4.287
  • Debian mumble 1.3.4

marcoSchr avatar Jan 21 '24 13:01 marcoSchr

Is there anything I can do or provide to help get this issue fixed?

Short of providing a PR with a fix, I don't think there currently is anything 🤔

Krzmbrzl avatar Jan 21 '24 14:01 Krzmbrzl