openal-soft icon indicating copy to clipboard operation
openal-soft copied to clipboard

extreme crackling sound in OpenAL applications caused by period count = 3

Open ballerburg9005 opened this issue 2 years ago • 8 comments

I was able to resolve this problem simply by setting the period count to 4 inside openal-config (it seemed to default to 3).

Like you can see from the following links:

  • Sound sample: https://vocaroo.com/1iXNooJWY3Ks
  • Original bug report: https://github.com/ZDoom/gzdoom/issues/1730

the issue is very severe and it took me several hours of skillful testing to trace back the origin of the problem. As OpenAL is a low-level programming API like OpenGL, users do not expect it to even have a config file and settings to change, and are probably not even aware of it. Also the situation is quite elusive, as people are lead to believe from internet search results that they should change the relevant hardware settings from within their .asoundrc as a general approach, which even partially leads to success, but is just a fiddling nightmare and time vampire. On top of this many people use pulseaudio and other agents that only provide additional sources of error and complexity, which makes it even worse to debug.

Trying to find solutions to the issue online, I encountered tons of people reporting the issue, but often without being aware that the cause is or could be OpenAL. I even read about the solution once, but it was described so poorly on a forum that I misattributed it to denote .asoundrc settings. Personally I had experienced the issue for a year in Telegram-Desktop, but I it totally puzzled me because I never would have assumed it to use OpenAL (it even says "alsa" in the settings iirc), and also I remember that other OpenAL applications like Chromium-bsu used to work properly some years ago. This was only further confounded by the fact that the issue would be 100% gone entirely in Telegram (but not other OpenAL applications) every 20th time or so I started the program. I think I even made a bug report for Telegram that never got a response. So since I didn't use Telegram often anyway, I just avoided playing media in it.

I hope it becomes evident that even though many people are affected by this, and the issue can be solved by a user setting, people will not actually gain awareness over the cause of the problem and hence the solution remains inaccessible to them.

This is why I suggest that the default period count should be raised to 4 (or 6 or whatever works for everyone else), and period count 3 should only be available if set explicitly to that value.

ballerburg9005 avatar Sep 05 '22 12:09 ballerburg9005

According to that original bug report, the issue seems to stem from the emu10k driver, and since you're having OpenAL use ALSA directly instead of PulseAudio or PipeWire, that increases the chance of OpenAL running into driver issues. The default buffer metrics of 20ms per period and 3 periods is already rather lenient, as it creates playback latency up to 60ms. Increasing the period count to 4 would increase the latency up to 80ms.

Can you provide a trace log from an app showing the issue? Set the ALSOFT_LOGLEVEL environment variable to 3 and OpenAL will write an informational log to stderr that you can pipe to a file to paste here (or also set ALSOFT_LOGFILE to a full path+filename to have the log written to that file instead).

Something else to try instead of increasing the period count may be to disable mmap'ing for the ALSA backend.

kcat avatar Sep 05 '22 16:09 kcat

Thanks for your reply.

If anything the emu10k driver is not to blame here.

In my original bug report, I was referring to a very rare and different phenomenon with the emu10k sound cards that also happens in other sound cards (emu10k cards historically are just much more common). If you look at the specs of those cards they were quite expensive high-end cards back in the day and have e.g. native 24bits and 192kHz which can be used in a single or 4 different channels, which makes the normal native rate 48000 instead of 44100. As far as I know this can somehow cause some kind of mismatch issue if programs work with the hardware on a very low level, but they do not handle the hardware properly and just make false assumptions about it. As I mentioned in the report, the cure for this would be to set up a dmix, thus alsa would be forced to perform the resampling to match whatever the program assumes to be true, like a rate of 44100. As far as I understand it, resampling also introduces more latency thus you naturally have to increase the buffer or period size as well. However, as I also mentioned this fix had no effect on OpenAL. And it was finally confirmed in the end of the report that the issue with OpenAL is indeed entirely unrelated (or maybe only insofar related as that OpenAL might handle the card improperly as well, but at the same time also overrides low-level settings made through Alsa, breaking the known solution).

So as you can see the driver itself is in perfect order and neither the unrelated problem I mentioned, nor the problem with OpenAL stems from the driver.

If you search on the internet, just tons of people have sound cards that simply cannot handle ultra-low latency settings (whatever the reason) and they get crackling issues. No other sound API but OpenAL causes this problem, only bad code causes it in other programs. I can see how a setting of 3 maybe works for most people. Maybe this is because lots and lots of cards natively work with a rate of 44100, and don't have to do any resampling, so they can cope with lower latency settings? This is only a guess, I am not an expert.

But conversely if your chipset is simply unusual - high-end, professional, non-consumer, old or whatever - only consider then that it is almost impossible to trace down the origin of this problem for a normal user. If they don't know where the issue comes from, they can't file bug reports here or fix their program. The sound is so so bad, you just have to switch it off to endure it.

This is why I think the priority should be to switch to default settings that just work for everyone without totally breaking the sound for some. It would come quite naturally for people who cared about tweaking performance, or who perceived any amount of audible latency at all, to look up in the OpenAL docs how to get it to perform even faster. However the other way around, for people who use Telegram to fix broken sound when there are 10 different ways to describe how exactly it is broken, all leading to different solutions, and then they are sent on an odysee to change 10 different low-level settings in alsa config, fix their Pulseaudio setup, toy with 10 different audio settings in Gzdoom or whatever program it is and so on and so forth ... this really is just madness by comparison and you can't expect anyone to cope with this situation other than by just walking away from it.

Wouldn't it be prudent for OpenAL to rather default to a high-latency value for maximum compatibility, and then it switches to a lower latency setting if it has confirmed by checks against the hardware (like e.g. a native rate of 44100, or something like that) that the sound card is likely to be able to handle a very low-latency setting?

Here are the logs diffs from supertuxcart (3 vs 6):

< [ALSOFT] (II) Pre-reset: *Stereo, *Float32, *44100hz, 2048 / 4096 buffer
> [ALSOFT] (II) Pre-reset: *Stereo, *Float32, *44100hz, 2048 / 12288 buffer
< [ALSOFT] (II) Post-reset: Stereo, Float32, 48000hz, 2229 / 4458 buffer
> [ALSOFT] (II) Post-reset: Stereo, Float32, 48000hz, 2229 / 15603 buffer
< [ALSOFT] (II) Post-start: Stereo, Float32, 48000hz, 2229 / 4458 buffer
> [ALSOFT] (II) Post-start: Stereo, Float32, 48000hz, 2229 / 15603 buffer

openal_supertuxcart_period_eq_3.txt openal_supertuxcart_period_eq_6.txt

They are just the same for other programs.

Changing the mmap setting for Alsa in alsoft-config had no effect whatsoever.

ballerburg9005 avatar Sep 05 '22 19:09 ballerburg9005

Here are the logs diffs from supertuxcart (3 vs 6):

I notice they both have non-default settings period_size=2048 and rt-prio=0. It's more than doubled the period size, which seems to be causing the period count to reduce to 2 (to keep the overall buffer size roughly the same with the specified period size), and disabled real-time priority. These settings are definitely not doing you many favors, since the mixer needs to do more on each update and is more prone to being preempted, missing the update window.

Wouldn't it be prudent for OpenAL to rather default to a high-latency value for maximum compatibility, and then it switches to a lower latency setting if it has confirmed by checks against the hardware (like e.g. a native rate of 44100) that the sound card is likely to be able to handle the low-latency settings?

OpenAL Soft already uses relatively high latency settings. I occasionally get complaints that the default latency settings are too high, but I've been reluctant to reduce them much because of weaker or more problematic hardware (though where the defaults are now seems to be a good compromise; high enough to not be a problem for the vast majority of people, and low enough for latency to not be detectable by most people with general gameplay). Crackling issues tend to come from weak CPUs, or CPUs with aggressive power saving behavior, and CPU intensive processing (like full HRTF, reverb, several dozen sources, etc, all at once).

In most cases, OpenAL Soft doesn't interact directly with hardware, and instead uses a sound server/service like WASAPI, CoreAudio, PulseAudio, or PipeWire (or ALSA with dmix). There's no way to "confirm" that a sound device is good for low-latency settings, but even if it could, it would then also depend on the game/app and how many resources it uses (a game that is CPU heavy on its own may be more prone to low latency issues than a game that's not, and a game that uses many sound sources and effects simultaneously may be more prone to low latency issues than a game that doesn't).

kcat avatar Sep 05 '22 20:09 kcat

I changed those settings just now and it didn't make any difference.

The card can only handle a combined value of periods*period_size of 8192 well. 3*2048 or 6*1024 all work out to the same crackling noise. HRTF or rt-prio make no difference at all. So this isn't related to any special settings of the programs or my alsoft.conf. If period_size is supposed to be half the size you found in the log, that means that my distribution (Archlinux) has already doubled it with custom alsoft.conf, most likely in order to fight the same problem (however they did not set the period count).

I still think though this problem is something that is specifically caused by OpenAL, not only because it only happens in OpenAL applications. For example when I play a video in Firefox, I get the following output from cat /proc/asound/card0/pcm*/sub*/*_params | head -n 20

closed
closed
access: MMAP_INTERLEAVED
format: S16_LE
subformat: STD
channels: 2
rate: 48000 (48000/1)
**period_size: 1200**
**buffer_size: 4800**
tstamp_mode: NONE
period_step: 1
avail_min: 1200
start_threshold: 4800
stop_threshold: 4800
silence_threshold: 0
silence_size: 0
boundary: 5404319552844595200
closed
closed
closed

Doesn't this mean that Firefox is using even less latency than OpenAL and working just fine?

Those crackling noises are very excessive and predictable, like static noise but much more obnoxious, and not related to CPU usage, behavior or power. My system is quite beefy, on max performance and it is not like 1993 Doom or Telegram would amount to anything at all. Also I can tell you from experience with this card that CPU usage is entirely unrelated to the crackling caused by too small buffer sizes (e.g. from DOSBox). You would really have to nuke the CPU with extreme disk IO plus each core being at 100% from 10 different threads so that the cursor freezes, in order to cause the sound to skip or crackle in a game. I remember in the 90s such extreme CPU usage was not unusual, ... but today?! If crackling was only caused on very high CPU loads, like screen tearing, this would be much more tolerable and an entirely different situation. It is not like you have to switch sound off or uninstall the program in order to live with it.

I really don't know how widespread this issue is. But believe me the internet is really full of people being lost and filing bug reports, because buffer sizes are just cut too low and cause this very issue. A buffer size of 8192 is often then recommended. Not many programs use OpenAL, and identifying the true cause and solution is totally obscure and extremely remote to the average user.

I think it is a bit extreme to then leave people like me in this never ending nightmare situation, just because their sound cards are unusual. While 90% or so of people probably use the same kind of "AC97 compatible" onboard chipset, which is what makes the default setting work so well. Maybe this would be different if the drivers were coded differently, maybe it is complicated, I don't know about that. In the end though it is totally broken vs just a little bit more latency, that's quite a different impact.

Also consider that this is not just about games/entertainment. Desktop applications like Telegram are using OpenAL too (for whatever odd reason). Telegram has in places become a modern-day Facebook and WWW-alternative space, where people escape to in order to deal with government censorship, with no alternative programs to access the content.

My understanding of code for sound hardware is only crude, but from the little I have brushed against, I am sure there might be some quick & dirty ways to guesstimate if the sound hardware is simply unusual or not. Albeit there might be no way to make 100% sure whatever the card's exact limits are. If we only were able to detect whatever 1-2 chipset families that make up 90% of systems, or taking rate 44100 as an indicator, this would leave only "oddball" soundcards and 10x higher chances of problems with them to work against. To put those cards then indiscriminately in the high-latency category seems to me like a good compromise.

ballerburg9005 avatar Sep 05 '22 22:09 ballerburg9005

If period_size is supposed to be half the size you found in the log, that means that my distribution (Archlinux) has already doubled it with custom alsoft.conf, most likely in order to fight the same problem (however they did not set the period count).

I'd be surprised if that configuration is from your distribution. Not only is it forcing ALSA with no real-time processing, it's also setting 44100hz, stereo, with HRTF forced enabled. That's hardly something you'd want on all users by default. If that is from your distribution, I do not know the reason behind it. It obviously doesn't fix your issue, so that's unlikely what it's intended for.

Doesn't this mean that Firefox is using even less latency than OpenAL and working just fine?

No, it's using a 1200 period size (25ms, where OpenAL uses 960 by default, 20ms), and a 4800 buffer size (100ms, where OpenAL uses 960*3 by default, 60ms). Of course, Firefox isn't so focused on dynamic real-time 3D audio, it's mostly just streaming playback (where the primary concern is syncing audio to video playback, less about overall latency). It needs very little CPU time to handle the actual output, and uses those rather high parameters to leave itself a lot of headroom.

Those crackling noises are very excessive and predictable, like static noise but much more obnoxious, and not related to CPU usage, behavior or power.

You'd be surprised. There have been plenty capable CPUs that cause constant and consistent drop-outs like in that recording due to the CPU's power management putting OpenAL's mixing thread on a low-power core, and the thread/process not taking enough CPU time to make the CPU increase its performance.

I'm not saying that is what's happening to you. Just that when there are problems like this, that tends to be the cause. There can be other causes though, like imprecise hardware interrupts, or delayed thread wake-ups. If the CPU isn't the issue, I'd put my money on the hardware and/or driver. Creative isn't known for having made the best hardware, and the driver isn't likely as well-tested or maintained, so there can be issues with either the hardware not providing interrupts as expected, and/or the driver causing late wakeups (snd_pcm_wait returning much later than intended), leading to underruns since OpenAL isn't told in time that it needs to mix more audio. Larger buffers just workaround the issue by giving a whole lot of extra time for when it's late to wake up, at the cost of more latency.

That's just a guess, though. I would need the help of someone who has the hardware and can troubleshoot both OpenAL's ALSA backend, and the kernel device driver, to find out what's going on.

But believe me the internet is really full of people being lost and filing bug reports, because buffer sizes are just cut too low and cause this very issue. A buffer size of 8192 is often then recommended.

A buffer size of 8192 is 171ms, and it way too high of a default for real-time gaming. It may be fine for plain video/audio playback, but not when you need sounds to respond dynamically to changes in gameplay. It's nearly 1/5th of a second.

I think it is a bit extreme to then leave people like me in this never ending nightmare situation, just because their sound cards are unusual. While 90% or so of people probably use the same kind of "AC97 compatible" onboard chipset, which is what makes the default setting work so well.

It's not just that using an AC97 compatible chipset/driver, but also using ALSA directly. I wager using PulseAudio or PipeWire would also help, as they'll contain workarounds for common driver or hardware issues (so each individual app or audio library doesn't have to try to workaround whatever same issues they find).

As for leaving you in "this never ending nightmare situation", OpenAL Soft provides plentiful configuration options to help you workaround issues, as it seems you've done. And if I know what the issue actually is, I can try to implement better fixes, or add options to more exactly deal with the issue. But if the proposed solution is to use an arbitrarily large buffer size by default (3x more than the current default), without knowing the actual cause of the problem or whether it can show up again despite the "fix", that will be a tough sell.

kcat avatar Sep 06 '22 02:09 kcat

Ok, thanks for the answer. I am sorry if this issue is getting a bit long.

No, it's using a 1200 period size (25ms, where OpenAL uses 960 by default, 20ms), and a 4800 buffer size (100ms, where OpenAL uses 960*3 by default, 60ms). Of course, Firefox isn't so focused on dynamic real-time 3D audio, it's mostly just streaming playback (where the primary concern is syncing audio to video playback, less about overall latency). It needs very little CPU time to handle the actual output, and uses those rather high parameters to leave itself a lot of headroom.

In my default OpenAL config from Archlinux, OpenAL is using a period size of 2048, which makes the total buffer 6144 for 3 periods. Also changing the period size to 512 and the count to 12 yielded the same effect. This number is higher than the Firefox buffer of 4800, yet OpenAL has extreme crackling for something as simple as "mplayer -ao openal my.mp3" while Firefox has no issues (both unchanged by CPU load and power management). I do not understand how this makes sense, other than that OpenAL is somehow doing something bad (unrelated to buffer sizes), while other programs are not.

I wanted to do more testing, but yesterday I rebooted my computer and now the problem that has been there for almost (or at least) a year is gone. I can set the period_size to 256 and the count to 2 without any issues at all. Lower than that I will get a crackling sound every 10 seconds or so, at 128 buffer every 2 second. I suppose this is normal and totally unlike how it was before (dozens of cracks per second at 12x the buffer size).

I think what is happening is, that this bug is triggered by suspend to memory and it somehow only affects OpenAL. Like many people I only use reboots for system updates and otherwise suspend the PC if I leave it. For a year of rarely using Telegram, most of the time I would experience the heavy crackling but there were also days when it was entirely gone for no reason at all, and totally randomly so from one day to the next. I think I believed at the time I had to restart Telegram many times for the cracks to disappear. It happened so rarely I am not entirely sure about all the other actions I was doing at the same time. It wasn't related to doing system updates though, of which I did a couple.

I did a few suspends and reboots now ... but the bug would not reappear! Maybe it takes a dozen suspends in different situations, like games running at the same time, CPU being maxed out, or USB devices plugged in, in order to mess up interrupts or whatever triggers it. I will monitor it the next weeks.

This is really a mysterious bug, also considering my OpenAL had been configured from the start to run with higher buffer sizes than some other programs do, and still it was the only thing affected.

Could you give me some advice on how to debug this? I have worked with kernel drivers before, but not audio and suspend issues and only on a basic level.

ballerburg9005 avatar Sep 06 '22 11:09 ballerburg9005

In my default OpenAL config from Archlinux, OpenAL is using a period size of 2048, which makes the total buffer 6144 for 3 periods.

Don't forget the periods is reduced to 2 to compensate for the larger period size. The period size is also increased to compensate for the change in sample rate (since the config file tries to set 44.1khz, but the device changes it to 48khz instead). So you ultimately end up with a period_size of 2229 (2048*48000/44100 = 2229.1156, which rounds to 2229) and a total buffer size of 2229*2 = 4458.

It's actually a bit surprising the device accepts such a particular period size. There's also a possibility the hardware was having an issue with that weird period size that caused mistimings.

This number is higher than the Firefox buffer of 4800, yet OpenAL has extreme crackling for something as simple as "mplayer -ao openal my.mp3" while Firefox has no issues (both unchanged by CPU load and power management).

OpenAL is still a bit more intensive even for playing an audio stream, since it's applying HRTF and doing its own mixing. OpenAL will be a bit more sensitive to misbehaving hardware or drivers than a normal app that's just playing a simple audio stream directly to ALSA.

Could you give me some advice on how to debug this? I have worked with kernel drivers before, but not audio and suspend issues and only on a basic level.

At the least, you'll need to wait for the issue to return. It's much more difficult to debug an issue that's not occurring. If it does return, I'd first try setting a less weird period size (explicitly set both frequency=48000 and period_size=2048 to keep it from scaling to 2229, or use a period size that scales better, like frequency=44100 and period_size=882 or period_size=1764, for 20ms or 40ms respectively).

As a second thing to try, when no audio is playing, unload the emu10k module (and any other module that directly deals with the sound card), Ensure it's unloaded, then reload it again. If that fixes it, that would suggest a bug causing the hardware or driver to get into a bad state where it's not timing or reporting things correctly, and reloading/reinitializing the driver resets the bad state.

Thirdly, if that still doesn't help, I would check to see how long snd_pcm_wait actually waits (in OpenAL's AlsaPlayback::mixerProc, in alc/backends/alsa.cpp), by adding timing and trace calls around it. If it keeps waking up too early (before enough free space is available to write the next period) or too late (when there's not enough time to mix/write the next period), that would indicate a lower level issue somewhere in ALSA, the kernel drivers, or the hardware. Relatedly, it would also be helpful to check the return value of snd_pcm_avail_update in the same OpenAL function to ensure ALSA is reporting an appropriate amount of writable space, or if the buffer is constantly near empty or near full.

Where to go from there will depend on what the above testing actually finds.

kcat avatar Sep 06 '22 16:09 kcat

Ok, I will report back when I get the chance.

Thanks a lot for your help.

Update: I forgot about the problem, and now it just returned at random like before. Not sure what I did, despite rebooting and suspending my PC. Unfortunately I don't have time to debug today. Periodsize 882 * 7 = 6174 seems to work, * 6 does not.

Reloading the driver did fix the problem.

ballerburg9005 avatar Sep 06 '22 18:09 ballerburg9005