aes67-linux-daemon icon indicating copy to clipboard operation
aes67-linux-daemon copied to clipboard

Latency test fails with Linux kernel starting from 5.10.0x

Open MichalBury opened this issue 3 years ago • 25 comments

Hello,

I made some test on Raspberry4 and CM4. I have still XRUN when running latency_test. I have tried various Raspberry modules, Raspbian, Ubuntu etc. Tryed stock kernels, also preempt kernels. Can some share good working setup for Low latency usage on Raspberry?

Some info from latest test :

  • ubuntu 21.10
  • stock kernel Linux 5.13.0-1025-raspi aarch64

./run_test.sh S16_LE 48000 2 1

Compiling tools ... make: Nothing to be done for 'all'. Creating test file ... test file created Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Starting to record 60 sec from sink ... Recording raw data '/tmp/sink_test.raw' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo Starting to playback test on source ... Recording to file "test/sink_test.raw" successfull Test result: ok Terminating processes ... daemon exiting with code: 0

./run_latency_test.sh S16_LE 48000 2 60 128

Compiling tools ... make: Nothing to be done for 'all'. Using buffer size of 128 frames Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Running 10 secs latency test Scheduler set to Round Robin with priority 99... Playback device is hw:RAVENNA Capture device is hw:RAVENNA Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode Poll mode: no Loop limit is 2880000 frames, minimum latency = 64, maximum latency = 64 Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz) Failure Playback: *** frames = 69888 *** state : XRUN trigger_time: 5053.170167 tstamp : 0.000000 delay : 0 avail : 64 avail_max : 64 Capture: *** frames = 69760 *** state : XRUN trigger_time: 5053.170169 tstamp : 0.000000 delay : 0 avail : 128 avail_max : 128 Maximum read: 64 frames Maximum read latency: 1333.333us, 1.333333ms (750.0000Hz) End to end latency: 6.665 msecs Terminating processes ... daemon exiting with code: 0

MichalBury avatar Apr 25 '22 14:04 MichalBury

something is weird with your execution. From the traces you enclosed you have: Loop limit is 2880000 frames, minimum latency = 64, maximum latency = 64 whereas is should be: Loop limit is 2880000 frames, minimum latency = 128, maximum latency = 128 Are you using the latest daemon sources from master branch ?

bondagit avatar Apr 26 '22 18:04 bondagit

Yes I have double checked that the latest branch is used, I have simple done git clone https://github.com/bondagit/aes67-linux-daemon

and runned build.sh

MichalBury avatar Apr 26 '22 19:04 MichalBury

Hello,

i have done one more test with Raspbian 64bit, installed from new SD card. The result is the same "XRUN". The "Loop limit" is now OK.

Kernel: 5.15.32-v8+ #1538 SMP PREEMPT

./run_test.sh S16_LE 48000 2 1

Creating test file ... test file created Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Starting to record 60 sec from sink ... Recording raw data '/tmp/sink_test.raw' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo Starting to playback test on source ... Recording to file "test/sink_test.raw" successfull Test result: ok Terminating processes ... daemon exiting with code: 0

./run_latency_test.sh S16_LE 48000 2 60 128

Compiling tools ... make: Nothing to be done for 'all'. Using buffer size of 128 frames Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Running 10 secs latency test Scheduler set to Round Robin with priority 99... Playback device is hw:RAVENNA Capture device is hw:RAVENNA Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode Poll mode: no Loop limit is 2880000 frames, minimum latency = 128, maximum latency = 128 Hardware PCM card 2 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Hardware PCM card 2 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz) Failure Playback: *** frames = 45568 *** state : XRUN trigger_time: 2129.030265 tstamp : 0.000000 delay : 0 avail : 64 avail_max : 64 Capture: *** frames = 45440 *** state : XRUN trigger_time: 2129.030266 tstamp : 0.000000 delay : 0 avail : 128 avail_max : 128 Maximum read: 64 frames Maximum read latency: 1333.333us, 1.333333ms (750.0000Hz) End to end latency: 7.999 msecs Terminating processes ... daemon exiting with code: 0

MichalBury avatar Apr 27 '22 08:04 MichalBury

you can try to increase the latency, for example to 256 frames: ./run_latency_test.sh S16_LE 48000 2 60 256 If this is not working, you can try to double the latency at each step.

bondagit avatar Apr 27 '22 08:04 bondagit

tried up to 2048. still same result. but I need to stay at 128 to have "end to en latency" max 10ms.

So I'm wondering. Seen post that it's runnig on RPi4 with low latency without problems. Checked the Kernel setup. It's Preemp, Config_hz=250.

MichalBury avatar Apr 27 '22 08:04 MichalBury

I don't have this board so I cannot test directly. Can you report the output of: cat /proc/cpuinfo and uname -a

bondagit avatar Apr 27 '22 08:04 bondagit

required info:

cat /proc/cpuinfo

processor : 0 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3

processor : 1 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3

processor : 2 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3

processor : 3 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3

Hardware : BCM2835 Revision : c03140 Serial : 1000000033b333ce Model : Raspberry Pi Compute Module 4 Rev 1.0

uname -a Linux aes67test 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux

MichalBury avatar Apr 27 '22 09:04 MichalBury

I don't see anything evident unfortunately. Everything seems ok.

bondagit avatar Apr 27 '22 10:04 bondagit

You have written that you runned the test on NanoPi NEO2. Can you share your kernel config and version which was used for test. thank you.

MichalBury avatar Apr 28 '22 07:04 MichalBury

I used a NanoPI NEO2.

uname -a

Linux nanopineo2 5.8.8-sunxi64 #trunk SMP Sun Sep 13 01:50:54 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux

In enclosed the kernel config I used. config.gz

bondagit avatar Apr 28 '22 07:04 bondagit

I also have this problem with regular desktop computers, I'm guessing something changed in newer kernels that affect latency. Thanks for the config, I will also try on intel and ryzen.

I can't test below 128 frame. Any ideia why? If I specify e.g. 64, the test is still done for 128. Is it an Alsa thing?

Thanks

neoscopio avatar Apr 29 '22 18:04 neoscopio

Any ideia why? If I specify e.g. 64, the test is still done for 128. Is it an Alsa thing?

The driver works with a tick frame size set by the parameter: "tic_frame_size_at_1fs": 64 in general on Linux it doesn't make any sense to go below 48: all my attempts to go below 1ms @ 48Khz failed. Having clarified this we need at least two buffers of tic_frame_size_at_1fs frames each to avoid under-run on the player/capture sides. So the driver won't allow any buffer size programming below (2 * tic_frame_size_at_1fs), 128 frames in this case.

bondagit avatar Apr 29 '22 18:04 bondagit

If you change "tic_frame_size_at_1fs" to 48 in test/daemon.conf you can run a test with 96 frames and this is the minimum latency I could successfully achieve on a NanoPI NEO2 board.

./run_latency_test.sh S16_LE 48000 2 60 96 
Compiling tools ...
make: Nothing to be done for 'all'.
Using buffer size of 96 frames
Creating configuration files ..
net.ipv4.igmp_max_memberships = 66
Starting PTP master ...
Starting AES67 daemon ...
Waiting for PTP slave to sync ...
Running 10 secs latency test
Scheduler set to Round Robin with priority 99...
Playback device is hw:RAVENNA
Capture device is hw:RAVENNA
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 2880000 frames, minimum latency = 96, maximum latency = 96
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 96
  period_size  : 48
  period_time  : 1000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 48
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 96
  silence_threshold: 0
  silence_size : 0
  boundary     : 6917529027641081856
  appl_ptr     : 0
  hw_ptr       : 0
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 96
  period_size  : 48
  period_time  : 1000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 48
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 96
  silence_threshold: 0
  silence_size : 0
  boundary     : 6917529027641081856
  appl_ptr     : 0
  hw_ptr       : 0
Trying latency 96 frames, 2000.000us, 2.000000ms (500.0000Hz)
Success
Playback:
*** frames = 2880096 ***
  state       : RUNNING
  trigger_time: 5438.701330
  tstamp      : 0.000000
  delay       : 96
  avail       : 0
  avail_max   : 48
Capture:
*** frames = 2880000 ***
  state       : RUNNING
  trigger_time: 5438.701334
  tstamp      : 0.000000
  delay       : 0
  avail       : 0
  avail_max   : 48
Maximum read: 48 frames
Maximum read latency: 1000.000us, 1.000000ms (1000.0000Hz)
Playback time = 5438.701330, Record time = 5438.701334, diff = -4
End to end latency: 5.998 msecs
Terminating processes ...
daemon exiting with code: 0

bondagit avatar Apr 29 '22 18:04 bondagit

The reason I decided to set the "tic_frame_size_at_1fs" to 64 in the default configuration was to keep the driver ALSA interface usable via JACK (number of frames between JACK calls must be a power of 2)

bondagit avatar Apr 29 '22 18:04 bondagit

I've made a test on my Server with 2 x Xeon L5640 (24 Cores). It's running Ubuntu 20.04 LTS with Kernel 5.4.0-109-generic. Compiled and runned Latency test. Everything was OK.

The I have installed latest released kernel for this system 5.13.0-41-generic. Recompiled the project. Now I have "XRUN" on latency test.

So it seems that something major has changed in kernel. Will try to compile 5.8.8 Kernel version for Raspberry and make a test.

MichalBury avatar May 01 '22 19:05 MichalBury

Thanks for sharing. This is interesting information.

bondagit avatar May 01 '22 19:05 bondagit

I've made a test on my Server with 2 x Xeon L5640 (24 Cores). It's running Ubuntu 20.04 LTS with Kernel 5.4.0-109-generic. Compiled and runned Latency test. Everything was OK.

The I have installed latest released kernel for this system 5.13.0-41-generic. Recompiled the project. Now I have "XRUN" on latency test.

So it seems that something major has changed in kernel. Will try to compile 5.8.8 Kernel version for Raspberry and make a test.

This is confirmed also on my N40 mini PC with Intel® Celeron® Processor N4020 , 2 Cores/2 Threads: the latency tests fail systematically with both kernel 5.11.0-49-generic and 5.11.0-49-lowlatency and they work instead with kernel 5.8.9-050809-generic.

bondagit avatar May 02 '22 07:05 bondagit

I have made the same test on Raspberry PI4 with three different stock kernels. 5.15.32 - latency test XRUN 5.10.17 - latency test XRUN 5.4.79 - latency test OK

So now I need to find precisely from which Kernel version it's not running properly.

MichalBury avatar May 03 '22 08:05 MichalBury

it worked for me with last kernel of 5.9.x. So my guess is that the problem started at some point of 5.10.x. When we find where the problem started it should be easier to find the root cause of the issue.

bondagit avatar May 03 '22 08:05 bondagit

I'm relative new to the Pi when it comes to compile or change the kernel. I'm a bit embarrassed, but I don't know how to change the kernel version. Regular apt install throws an error about creating links, same kernel gets loaded. Can someone point me to a tutorial? Do I really need to change the kernel by mounting the SD card in another computer?

Thanks

neoscopio avatar May 04 '22 09:05 neoscopio

Btw: I've tryed mitigations=off kernel flag on desktop, but stil got xruns for frames =< 2048.

neoscopio avatar May 04 '22 09:05 neoscopio

Can someone point me to a tutorial?

in general it depends on the distro you are using if they offer some easy way to change kernel. For example for Ubuntu you can use: https://www.how2shout.com/linux/how-to-change-default-kernel-in-ubuntu-22-04-20-04-lts/

bondagit avatar May 04 '22 13:05 bondagit

For example for Ubuntu you can use: https://www.how2shout.com/linux/how-to-change-default-kernel-in-ubuntu-22-04-20-04-lts/

Thanks, my problem is that is not working for the Raspberry Pi. No grub, for one. Nm, I'll get around it.

neoscopio avatar May 05 '22 07:05 neoscopio

I did some more tests and the XRUN issue starts from kernel 5.10.0 on. The issue happens for both X86 and ARM platforms. For this reason, for the time being, I suggest using a kernel version <= 5.9.x

bondagit avatar May 07 '22 11:05 bondagit

Thanks, my problem is that is not working for the Raspberry Pi. No grub, for one. Nm, I'll get around it.

yes, usually on ARM boards in addition to the kernel and the headers you have to install the kernel dtb too. After these you cloud also have to update your bootloader configuration (usually U-boot). For armbian you can find instruction here: https://www.armbian.com/nanopi-neo-2/

bondagit avatar May 07 '22 11:05 bondagit

Another data point: If I build the upstream driver (master) with this patch only:

https://bitbucket.org/MergingTechnologies/ravenna-alsa-lkm/pull-requests/5

Running it in combination with their daemon, I don't hear the glitching due to this latency issue on Kernel 5.18. There seems to be incompatibility in the API which prevents that build of the driver being tested with the daemon from this repo:

http_server:: GET /cometd/ response 404

I do hear the issue running the driver/daemon built from this repo. I can see some additional driver code changes upstream that I suspect would be worth investigating the relevance of.

ArrEssJay avatar Oct 10 '22 02:10 ArrEssJay

The problem reproduces on multiple audio cards by using the original latency application in alsa-lib. So it doesn't seem to be related to the RAVENNA driver.

bondagit avatar Nov 22 '22 20:11 bondagit

perexg commented 11 minutes ago

I think that I found the upstream change between 5.9 and 5.10 kernels in the scheduler which causes this behavior: https://lore.kernel.org/lkml/c596a06773658d976fb839e02843a459ed4c2edf.1479204252.git.bristot@redhat.com/ See the patch description. You can set the old behaviour using this command: sysctl -w kernel.sched_rt_runtime_us=1000000 Note that this command may be dangerous - see the patch description. The task can eat whole CPU preventing to run other tasks.

bondagit avatar Nov 23 '22 19:11 bondagit

I will run some more tests and add this setting to the documentation and scripts. On kernel >= 5.10.x use the following setting to restore old kernel scheduler behaviour:

sysctl -w kernel.sched_rt_runtime_us=1000000

bondagit avatar Nov 23 '22 19:11 bondagit

@MichalBury can you checkout branch iissue-79 and verify if the latency test works for you ?

bondagit avatar Nov 25 '22 09:11 bondagit