Latency test fails with Linux kernel starting from 5.10.0x
Hello,
I made some test on Raspberry4 and CM4. I have still XRUN when running latency_test. I have tried various Raspberry modules, Raspbian, Ubuntu etc. Tryed stock kernels, also preempt kernels. Can some share good working setup for Low latency usage on Raspberry?
Some info from latest test :
- ubuntu 21.10
- stock kernel Linux 5.13.0-1025-raspi aarch64
./run_test.sh S16_LE 48000 2 1
Compiling tools ... make: Nothing to be done for 'all'. Creating test file ... test file created Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Starting to record 60 sec from sink ... Recording raw data '/tmp/sink_test.raw' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo Starting to playback test on source ... Recording to file "test/sink_test.raw" successfull Test result: ok Terminating processes ... daemon exiting with code: 0
./run_latency_test.sh S16_LE 48000 2 60 128
Compiling tools ... make: Nothing to be done for 'all'. Using buffer size of 128 frames Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Running 10 secs latency test Scheduler set to Round Robin with priority 99... Playback device is hw:RAVENNA Capture device is hw:RAVENNA Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode Poll mode: no Loop limit is 2880000 frames, minimum latency = 64, maximum latency = 64 Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz) Failure Playback: *** frames = 69888 *** state : XRUN trigger_time: 5053.170167 tstamp : 0.000000 delay : 0 avail : 64 avail_max : 64 Capture: *** frames = 69760 *** state : XRUN trigger_time: 5053.170169 tstamp : 0.000000 delay : 0 avail : 128 avail_max : 128 Maximum read: 64 frames Maximum read latency: 1333.333us, 1.333333ms (750.0000Hz) End to end latency: 6.665 msecs Terminating processes ... daemon exiting with code: 0
something is weird with your execution. From the traces you enclosed you have:
Loop limit is 2880000 frames, minimum latency = 64, maximum latency = 64
whereas is should be:
Loop limit is 2880000 frames, minimum latency = 128, maximum latency = 128
Are you using the latest daemon sources from master branch ?
Yes I have double checked that the latest branch is used, I have simple done git clone https://github.com/bondagit/aes67-linux-daemon
and runned build.sh
Hello,
i have done one more test with Raspbian 64bit, installed from new SD card. The result is the same "XRUN". The "Loop limit" is now OK.
Kernel: 5.15.32-v8+ #1538 SMP PREEMPT
./run_test.sh S16_LE 48000 2 1
Creating test file ... test file created Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Starting to record 60 sec from sink ... Recording raw data '/tmp/sink_test.raw' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo Starting to playback test on source ... Recording to file "test/sink_test.raw" successfull Test result: ok Terminating processes ... daemon exiting with code: 0
./run_latency_test.sh S16_LE 48000 2 60 128
Compiling tools ... make: Nothing to be done for 'all'. Using buffer size of 128 frames Creating configuration files .. net.ipv4.igmp_max_memberships = 66 Starting PTP master ... Starting AES67 daemon ... Waiting for PTP slave to sync ... Running 10 secs latency test Scheduler set to Round Robin with priority 99... Playback device is hw:RAVENNA Capture device is hw:RAVENNA Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode Poll mode: no Loop limit is 2880000 frames, minimum latency = 128, maximum latency = 128 Hardware PCM card 2 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Hardware PCM card 2 'Merging RAVENNA' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 128 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 2147483647 stop_threshold : 128 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz) Failure Playback: *** frames = 45568 *** state : XRUN trigger_time: 2129.030265 tstamp : 0.000000 delay : 0 avail : 64 avail_max : 64 Capture: *** frames = 45440 *** state : XRUN trigger_time: 2129.030266 tstamp : 0.000000 delay : 0 avail : 128 avail_max : 128 Maximum read: 64 frames Maximum read latency: 1333.333us, 1.333333ms (750.0000Hz) End to end latency: 7.999 msecs Terminating processes ... daemon exiting with code: 0
you can try to increase the latency, for example to 256 frames:
./run_latency_test.sh S16_LE 48000 2 60 256
If this is not working, you can try to double the latency at each step.
tried up to 2048. still same result. but I need to stay at 128 to have "end to en latency" max 10ms.
So I'm wondering. Seen post that it's runnig on RPi4 with low latency without problems. Checked the Kernel setup. It's Preemp, Config_hz=250.
I don't have this board so I cannot test directly.
Can you report the output of:
cat /proc/cpuinfo
and
uname -a
required info:
cat /proc/cpuinfo
processor : 0 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3
processor : 1 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3
processor : 2 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3
processor : 3 BogoMIPS : 108.00 Features : fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3
Hardware : BCM2835 Revision : c03140 Serial : 1000000033b333ce Model : Raspberry Pi Compute Module 4 Rev 1.0
uname -a Linux aes67test 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux
I don't see anything evident unfortunately. Everything seems ok.
You have written that you runned the test on NanoPi NEO2. Can you share your kernel config and version which was used for test. thank you.
I used a NanoPI NEO2.
uname -a
Linux nanopineo2 5.8.8-sunxi64 #trunk SMP Sun Sep 13 01:50:54 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux
In enclosed the kernel config I used. config.gz
I also have this problem with regular desktop computers, I'm guessing something changed in newer kernels that affect latency. Thanks for the config, I will also try on intel and ryzen.
I can't test below 128 frame. Any ideia why? If I specify e.g. 64, the test is still done for 128. Is it an Alsa thing?
Thanks
Any ideia why? If I specify e.g. 64, the test is still done for 128. Is it an Alsa thing?
The driver works with a tick frame size set by the parameter:
"tic_frame_size_at_1fs": 64
in general on Linux it doesn't make any sense to go below 48: all my attempts to go below 1ms @ 48Khz failed.
Having clarified this we need at least two buffers of tic_frame_size_at_1fs frames each to avoid under-run on the player/capture sides.
So the driver won't allow any buffer size programming below (2 * tic_frame_size_at_1fs), 128 frames in this case.
If you change "tic_frame_size_at_1fs" to 48 in test/daemon.conf you can run a test with 96 frames and this is the minimum latency I could successfully achieve on a NanoPI NEO2 board.
./run_latency_test.sh S16_LE 48000 2 60 96
Compiling tools ...
make: Nothing to be done for 'all'.
Using buffer size of 96 frames
Creating configuration files ..
net.ipv4.igmp_max_memberships = 66
Starting PTP master ...
Starting AES67 daemon ...
Waiting for PTP slave to sync ...
Running 10 secs latency test
Scheduler set to Round Robin with priority 99...
Playback device is hw:RAVENNA
Capture device is hw:RAVENNA
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 2880000 frames, minimum latency = 96, maximum latency = 96
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 96
period_size : 48
period_time : 1000
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 48
period_event : 0
start_threshold : 2147483647
stop_threshold : 96
silence_threshold: 0
silence_size : 0
boundary : 6917529027641081856
appl_ptr : 0
hw_ptr : 0
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 96
period_size : 48
period_time : 1000
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 48
period_event : 0
start_threshold : 2147483647
stop_threshold : 96
silence_threshold: 0
silence_size : 0
boundary : 6917529027641081856
appl_ptr : 0
hw_ptr : 0
Trying latency 96 frames, 2000.000us, 2.000000ms (500.0000Hz)
Success
Playback:
*** frames = 2880096 ***
state : RUNNING
trigger_time: 5438.701330
tstamp : 0.000000
delay : 96
avail : 0
avail_max : 48
Capture:
*** frames = 2880000 ***
state : RUNNING
trigger_time: 5438.701334
tstamp : 0.000000
delay : 0
avail : 0
avail_max : 48
Maximum read: 48 frames
Maximum read latency: 1000.000us, 1.000000ms (1000.0000Hz)
Playback time = 5438.701330, Record time = 5438.701334, diff = -4
End to end latency: 5.998 msecs
Terminating processes ...
daemon exiting with code: 0
The reason I decided to set the "tic_frame_size_at_1fs" to 64 in the default configuration was to keep the driver ALSA interface usable via JACK (number of frames between JACK calls must be a power of 2)
I've made a test on my Server with 2 x Xeon L5640 (24 Cores). It's running Ubuntu 20.04 LTS with Kernel 5.4.0-109-generic. Compiled and runned Latency test. Everything was OK.
The I have installed latest released kernel for this system 5.13.0-41-generic. Recompiled the project. Now I have "XRUN" on latency test.
So it seems that something major has changed in kernel. Will try to compile 5.8.8 Kernel version for Raspberry and make a test.
Thanks for sharing. This is interesting information.
I've made a test on my Server with 2 x Xeon L5640 (24 Cores). It's running Ubuntu 20.04 LTS with Kernel 5.4.0-109-generic. Compiled and runned Latency test. Everything was OK.
The I have installed latest released kernel for this system 5.13.0-41-generic. Recompiled the project. Now I have "XRUN" on latency test.
So it seems that something major has changed in kernel. Will try to compile 5.8.8 Kernel version for Raspberry and make a test.
This is confirmed also on my N40 mini PC with Intel® Celeron® Processor N4020 , 2 Cores/2 Threads:
the latency tests fail systematically with both kernel 5.11.0-49-generic and 5.11.0-49-lowlatency and they work instead with kernel 5.8.9-050809-generic.
I have made the same test on Raspberry PI4 with three different stock kernels. 5.15.32 - latency test XRUN 5.10.17 - latency test XRUN 5.4.79 - latency test OK
So now I need to find precisely from which Kernel version it's not running properly.
it worked for me with last kernel of 5.9.x. So my guess is that the problem started at some point of 5.10.x. When we find where the problem started it should be easier to find the root cause of the issue.
I'm relative new to the Pi when it comes to compile or change the kernel. I'm a bit embarrassed, but I don't know how to change the kernel version. Regular apt install throws an error about creating links, same kernel gets loaded. Can someone point me to a tutorial? Do I really need to change the kernel by mounting the SD card in another computer?
Thanks
Btw: I've tryed mitigations=off kernel flag on desktop, but stil got xruns for frames =< 2048.
Can someone point me to a tutorial?
in general it depends on the distro you are using if they offer some easy way to change kernel. For example for Ubuntu you can use: https://www.how2shout.com/linux/how-to-change-default-kernel-in-ubuntu-22-04-20-04-lts/
For example for Ubuntu you can use: https://www.how2shout.com/linux/how-to-change-default-kernel-in-ubuntu-22-04-20-04-lts/
Thanks, my problem is that is not working for the Raspberry Pi. No grub, for one. Nm, I'll get around it.
I did some more tests and the XRUN issue starts from kernel 5.10.0 on. The issue happens for both X86 and ARM platforms. For this reason, for the time being, I suggest using a kernel version <= 5.9.x
Thanks, my problem is that is not working for the Raspberry Pi. No grub, for one. Nm, I'll get around it.
yes, usually on ARM boards in addition to the kernel and the headers you have to install the kernel dtb too. After these you cloud also have to update your bootloader configuration (usually U-boot). For armbian you can find instruction here: https://www.armbian.com/nanopi-neo-2/
Another data point: If I build the upstream driver (master) with this patch only:
https://bitbucket.org/MergingTechnologies/ravenna-alsa-lkm/pull-requests/5
Running it in combination with their daemon, I don't hear the glitching due to this latency issue on Kernel 5.18. There seems to be incompatibility in the API which prevents that build of the driver being tested with the daemon from this repo:
http_server:: GET /cometd/ response 404
I do hear the issue running the driver/daemon built from this repo. I can see some additional driver code changes upstream that I suspect would be worth investigating the relevance of.
The problem reproduces on multiple audio cards by using the original latency application in alsa-lib. So it doesn't seem to be related to the RAVENNA driver.
perexg commented 11 minutes ago
I think that I found the upstream change between 5.9 and 5.10 kernels in the scheduler which causes this behavior: https://lore.kernel.org/lkml/c596a06773658d976fb839e02843a459ed4c2edf.1479204252.git.bristot@redhat.com/ See the patch description. You can set the old behaviour using this command: sysctl -w kernel.sched_rt_runtime_us=1000000 Note that this command may be dangerous - see the patch description. The task can eat whole CPU preventing to run other tasks.
I will run some more tests and add this setting to the documentation and scripts. On kernel >= 5.10.x use the following setting to restore old kernel scheduler behaviour:
sysctl -w kernel.sched_rt_runtime_us=1000000
@MichalBury can you checkout branch iissue-79 and verify if the latency test works for you ?