jamulus icon indicating copy to clipboard operation
jamulus copied to clipboard

Improve the jitter buffer

Open corrados opened this issue 3 years ago • 14 comments

The idea is: A new jitter buffer uses a timing histogram to compute the actual jitter relative to the audio device or system clock timer. A bit sequence will be added to all transmitted frames to improve the jitter histogram computation.

Open issues:

  • What about compatibility to old Jamulus servers and clients?
  • How does the new auto jitter buffer algorithm behave? For what situation is it optimized? See, e.g., https://github.com/corrados/jamulus/issues/417.
  • The old algorithm was designed to measure the block error rate for different jitter buffer sizes. What is the metric for the new auto jitter buffer algorithm to set the buffer size?
  • We are using an unused bit in the OPUS bit stream. Question: Why is this bit not used? Is it a reserved bit? Do we have a reference to the OPUS specification document, where this bit is mentioned? Is there a risk that at a later version of OPUS that this bit is then used by OPUS and we cannot upgrade Jamulus to a new OPUS version?
  • A documentation of the algorithm should be added so that others can understand what's going on and can improve/tune the new algorithm.

A first patch is already available: https://github.com/corrados/jamulus/pull/539.

See also the discussion in the pull request: https://github.com/corrados/jamulus/pull/529#issuecomment-678292903

corrados avatar Aug 26 '20 15:08 corrados

I tried the patch given in #539 and installed it on a Rpi 4. I ran both the server and console on this machine, using the local loop back IP address.

For the first few minutes the jitter buffer sizing was very erratic with lots of jitter red lights and overall delays wildly varying from 8ms to 50ms, with ping time mostly 0ms.

After letting it run for ten minutes it seemed to settle down a little, with the jitter light staying mostly green but still the overall delays would jump from 9ms to 18ms, most of that seemed to be the ping time jumping back and forth from 0ms to 10ms. (Remember, this was all running local).

The analyzer console showed all buffer sizes pegged at the top.

I did not try to listen to any audio during this time.

bflamig avatar Aug 26 '20 16:08 bflamig

@bflamig : Can you re-test. I've made some updates. Thank you!

hselasky avatar Aug 27 '20 13:08 hselasky

What about compatibility to old Jamulus servers and clients?

No issues with compatibility.

How does the new auto jitter buffer algorithm behave? For what situation is it optimized? See, e.g., #417.

To be investigated.

The old algorithm was designed to measure the block error rate for different jitter buffer sizes.

The new algorithm is based entirely on timing. The uncertainty of the RX UDP packets with regards to the Audio interrupt (client side) or Timer interrupt (server side), directly is used to compute the jitter buffer size.

We are using an unused bit in the OPUS bit stream. Question: Why is this bit not used?

When OPUS initialize the bit-stream it adds 1 bit always (which is then never used, at least for Jamulus :-) libs/opus/celt/entenc.c: _this->nbits_total=EC_CODE_BITS+1;

hselasky avatar Aug 27 '20 13:08 hselasky

@hselasky:

Will retest. Thanks.

bflamig avatar Aug 27 '20 17:08 bflamig

@hselasky: Did a quick test by compiling to both a Windows machine and a linux machine (Rpi4). Ran the server on the Rpi, client on Windows. Result: Similar behavior as the last test, but not as extreme. For first few minutes the jitter buffer sizes are moving around, but usually only a few steps at a time. The jitter led flashes red more than I'd like (every 5 to 10 seconds or so.)

Then I shut down the Windows client and decided to run both client and server on the Rpi. At first, I got the same behavior as described above.

After running for, say, ten minutes, things settled to some kind of steady state. The analyzer console shows the most of the dots (I don't really know what they represent) near the top. The ping time now alternates between 0 and 9 or 10 ms like the last experiment I described previously before you changed the code. Jitter led stays mostly green. Jitter buffer sizes are pretty much fixed at 2 for both client and server.

Message at top of analyzer console has "server clock drift" steady at -0.139101. Receive packet drops steady at 0.0000. Looks like it has reached a stable state, except for ping time alternating between 0 and 9 or 10 ms. Is its computation somehow being messed up?

Update: Ran the same experiment again with both client and server on a Rpi. It takes about 5 minutes for all the dots on the analyzer console to move to the top. They mostly stay low, and then all of a sudden they transition to the top over a few seconds. And this is when the ping time starts alternating between 0 and 9 or 10 ms. Jitter led stays steady green, buffer sizes at 2 and 2. Server clock drift steady at -0.175120. Receive packet drops = 0.0000

Note: I have "Enable small network buffers" turned on, and Buffer delay selection of 64.

Hope this helps.

bflamig avatar Aug 27 '20 18:08 bflamig

@bflamig : Server clock drift -0.175120 : This likely means there is something wrong with the clock on the RPI. The clock is 17% off 48kHz if my measurements are correct. Can you check the clock settings a bit. My RPI3 can run both 600 MHz and 1200 MHz. Is this an RPI3.

I've seen something similar over here running my RPI3 in 1.2 GHz mode.

hselasky avatar Aug 27 '20 19:08 hselasky

@bflamig : Please re-test Windows build. Seems to work over here with USB headset. Make sure you run the same code on server and client.

Thank you!

hselasky avatar Aug 27 '20 20:08 hselasky

I have a Rpi4 8GB. I do have other models I can test on too. How do I check the clock settings? Or do you mean the CPU frequency?

I'll try out the latest build.

bflamig avatar Aug 27 '20 23:08 bflamig

@hselasky: Tried out your latest code. Compiled to both Windows 10 and Rpi. Server running on the Rpi. Client on Windows 10. Both on the local network:

After settling for a few minutes. the W10 client shows ping time of 0-1ms, mostly 0. Overall delay from around 12ms to 15 ms, sometimes more. ASIO Buffer setting at 64. The jitter buffers both vary mostly between 2 and 4. Jitter led goes red about every 5 seconds, sometimes more. Clock drift at -0.000164, not varying much.

Note that both machines are connected to a fast network switch and then to fast router. I've been meaning to remove the switch but it means reconfiguring some cables. Don't want to do that right now.

Then I tried running a client on the Rpi4 as well as the server. Pretty much the same behavior as the Windows 10 client, but clock drift after a few minutes was around 0.001, about 10 times higher than the Windows machine. The ping time varied between 0 and 4 ms at first.

After, say, 10-15 minutes, things start to go haywire. Clock drift goes rapidly up to -0.033816 and stays there. The dots on the analyzer console are near the top. And now we're back to the the ping time alternating between 0 and 10 ms, pretty much every update (1/2 sec ? 1 sec?) Jitter buffer sizes stuck at 4 local, 2 server. CPU load average roughly 0.32 as reported by htop. (I don't know what that number means).

The Rpi4 has min and max frequencies of 600 MHz, and 1.5 GHz. When running both client and server on the Rpi4, "vgencmd measure_clock arm" measures between 750 MHz and 1.5 GHz depending on what else I'm doing on the machine. The latter occurred when also running a client on Windows just to see what happens. I disconnected about 10 minutes before I took the 750 MHz measurement. Temp 47 degC.

bflamig avatar Aug 28 '20 01:08 bflamig

Clock drift goes rapidly up to -0.033816 and stays there. The dots on the analyzer console are near the top. It clearly shows there is a problem with the RPI4. Are you able to fix the CPU frequency to 600 MHz? Are you able to run the server on your Windows machine, just via localhost (127.0.0.1) and then connect there for 10 minutes?

hselasky avatar Aug 28 '20 10:08 hselasky

I have a Rpi4 8GB. I do have other models I can test on too. How do I check the clock settings? Or do you mean the CPU frequency?

I'll try out the latest build.

https://raspberrypi.stackexchange.com/questions/1219/how-do-i-determine-the-current-mhz

hselasky avatar Aug 28 '20 10:08 hselasky

@hesalasky:

I'll give running local host on my Windows machine and see what happens.

Seems to me the Rpi4, as is, without making any changes to it, makes for a good test. Maybe the clock is all over the place on it, but that's going to be the real world for lots of users. Are you wanting a test at a fixed frequency simply as a way of "peeling the onion" as many of us like to say (getting to the heart of the matter by peeling away layers of complexity)?

I want to point out too that it could be I wasn't downloading the code properly. I'm fairly new to using with remote repositories and I discovered this morning when downloading your floating point branch that it was getting the wrong versions of files. (It had, for example, the jitter buffer stuff which wasn't suppose to be there). I discovered I was checking out branches improperly and was getting a weird mix of stuff. (I think I was probably getting your master branch instead of just the floating_point).

So that means I should re-test the jitter_buffer version just to make sure wires weren't being crossed.

bflamig avatar Aug 28 '20 18:08 bflamig

Running local host loop back on Windows results: Never goes unstable. Jitter buffer settings vary from 2 to 4 on auto. Occasional dropouts as indicated by Jitter led and audible clocks and pops. Never seems to go more than 10 seconds without having them. Ping time steady at 0 ms.

Running local host loop back on Rpi4. First attempt went unstable after about 5 minutes, ping time alternating between 0 and other values constantly. Dots pegged in the analyzer console.

I tried setting the Rpi4 clock to 600 Mhz (used a setting in /boot/config.txt. ) Results: ping time not stable from the very start (should be 0!), alternating as before. Things went unstable within a minute! Dots pegged in the analyzer console. Server clock drift -0.196971. Jitter units at 14 local 2 server.

bflamig avatar Aug 28 '20 20:08 bflamig

I'm moving this back to backlog as I don't think anybody is working on it. I'll also mention two related, but possibly not identical, issues/discussions: #1404, #1054

hoffie avatar Mar 31 '21 15:03 hoffie