FlexASIO WASAPI Exclusive full duplex glitching

With REW using FlexASIO 1.0 in full duplex mode with some USB device at 48 kHz, I am unable to get glitch-free playback under any set of parameters (buffer size, suggested latency, shared or exclusive mode). The glitches are infrequent, but they're there.

Meanwhile, I am not getting any glitches under any of the other backends when using a buffer size of 20 ms along with a suggested latency of 60 ms (for example).

Dec 07 '18 19:12 dechamps

This is fairly tough to reproduce. There are times where I don't get any glitches for multiple minutes. One thing that happens fairly often is a single glitch less than a few seconds after streaming starts.

Dec 07 '18 21:12 dechamps

With #34 implemented, I was able to investigate this some more by inspecting spectrograms of recorded audio, which is way more reliable and repeatable than doing this by ear.

In FlexASIO 1.3, using some USB audio device, full duplex, and default settings, I do not get any glitches in WASAPI Shared mode. In Exclusive mode however, there is a weird issue where exactly one glitch seems to produced in every session. The glitch seems to land randomly between 0 and ~30 seconds after streaming starts.

Jan 05 '19 16:01 dechamps

Here are a few additional findings from the latest dev branch (84b758a244d5a0aa85fbf15258b3e2b0fbd7cfc3):

I am definitely unable to reproduce this in shared mode.
I can still reproduce this in exclusive mode.
I cannot reproduce this when using the input (record) direction only.
I cannot reproduce this when using the output (playback) direction only.
When the glitch occurs it's not just an input glitch - it's visible in the output signal as well.

Clearly this looks like an issue with the way the PortAudio WASAPI logic handles full duplex buffer management in exclusive mode.

Feb 03 '19 11:02 dechamps

Salût Etienne,

I make nearly the same experience with PortAudio and WASAPI. In my setup I play out to PortAudio on a WASAPI output device (i.e. RME) in shared mode and I often get drop outs. I initialise the RME PortAudio WASAPI device with 0 for the frame buffer size which gives me 480 samples or 2880 bytes frame buffer size (24 bit stereo). IMO this is already sub-optimal as the RME board is configured to use 256 samples per buffer.

What I find is that relatively often there are "holes" of 480 samples (2880 bytes of zero) and afterwards my audio data is continuing at the very same sample where the zeroes have been introduces aka as if there would have been a buffer underrun. Interestingly PortAudio is not reporting an underrun with its callback flags.

When I do the same test with a PortAudio WDM-KS or DirectSound or ASIO device there are absolutely no errors.

So I have the feeling that we're suffering from the same problem.

Do you already have reported this issue to the PortAudio community?

BTW, how do you do full duplex with WASAPI? I never saw a WASAPI device offering bothh, inputs and outputs.

Cheers Axel

Jul 24 '19 08:07 aholzinger

In my setup I play out to PortAudio on a WASAPI output device (i.e. RME) in shared mode and I often get drop outs. So I have the feeling that we're suffering from the same problem.

I doubt it. This thread is about an issue with exclusive mode full-duplex specifically. You seem to be experiencing a wider problem with WASAPI being unusable at all in any mode.

I initialise the RME PortAudio WASAPI device with 0 for the frame buffer size which gives me 480 samples or 2880 bytes frame buffer size (24 bit stereo)

480 samples at 48 kHz is only 10 ms. Depending on circumstances, it can be difficult for a userspace process to consistently meet a 10 ms scheduling deadline. Have you tried using larger buffers?

IMO this is already sub-optimal as the RME board is configured to use 256 samples per buffer.

If you use WASAPI Shared, you're not dealing with hardware buffers and therefore the hardware buffer size that your device uses is irrelevant. It's only possibly relevant if you use Exclusive mode, but even then, I doubt it because I would expect to find some larger intermediate buffer.

By the way, 256 samples at 48 kHz is about 5 ms. It's very hard for userspace to meet a 5 ms deadline consistently. I doubt you'll be able to use such a small buffer size.

Do you already have reported this issue to the PortAudio community?

Not yet, no.

BTW, how do you do full duplex with WASAPI? I never saw a WASAPI device offering bothh, inputs and outputs.

By "full duplex" I mean passing both an input device (i.e. a device with maxInputChannels > 0) and an output device (i.e. a device with maxOutputChannels > 0) to Pa_OpenStream(), as opposed to opening an input-only stream or an output-only stream. WASAPI itself doesn't support full-duplex streams, but PortAudio can emulate them by opening two WASAPI streams and juggling buffers between the two. That's supposed to be transparent to the PortAudio user, and it would be if not for this glitch bug.

Jul 24 '19 09:07 dechamps

480 samples at 48 kHz is only 10 ms. Depending on circumstances, it can be difficult for a userspace process to consistently meet a 10 ms scheduling deadline. Have you tried using larger buffers?

It's not an issue of buffer latencies. PortAudio with ASIO to the RME board with 256 samples works like a charm, even 128 samples (changing the RME configuration and the on PortAudio using 128 samples also) is working perfectly. The issue isn't the process or the system. Otherwise PortAudio would report an underrung, wouldn't it?

If you use WASAPI Shared, you're not dealing with hardware buffers and therefore the hardware buffer size that your device uses is irrelevant. It's only possibly relevant if you use Exclusive mode, but even then, I doubt it because I would expect to find some larger intermediate buffer.

I know that audio is routed via the windows audio service when in shared mode, still there's no advantage in using 480 samples of buffer size, when the hardware and the low level driver is using 256.

I did a test with with PortAudio WASAPI exclusive mode and the errors are gone, but the buffer size remains 480 samples.

Also with PortAudio WASPAI shared and 960 samples the the errors are gone. While with PortAudio DirectSound and 480 samples also there aren't any errors.

By the way, 256 samples at 48 kHz is about 5 ms. It's very hard for userspace to meet a 5 ms deadline consistently. I doubt you'll be able to use such a small buffer size.

On good systems you can go down to 64 samples buffer size using ASIO. But better then do nothing else than audio.

Do you already have reported this issue to the PortAudio community?

Not yet, no.

Do you plan to do so?

I think the WASAPI backend of PortAudio is lacking behind the DirectSound, because of the test results (see above). In the DirectSound is also using the Windows audio service, so there's no need that DirectSound should behave better than WASAPI. This makes me believe the reason is the PortAudio WASAPI code.

What is happening on your side if you use WDM-KS instead of WASAPI exclusive? Both go directly to the hardware.

Jul 24 '19 10:07 aholzinger

It's not an issue of buffer latencies. PortAudio with ASIO to the RME board with 256 samples works like a charm, even 128 samples (changing the RME configuration and the on PortAudio using 128 samples also) is working perfectly. The issue isn't the process or the system.

Perhaps. Different backends use very different code paths, which makes things hard to compare and reason about. More on this below.

Otherwise PortAudio would report an underrung, wouldn't it?

Overflow/underrun reporting is a hard problem in general. You shouldn't assume PortAudio (or in fact any audio stack in general) can report overflows or underruns reliably. In fact, I wouldn't be surprised if some PortAudio backends did not support overflow/underrun reporting at all, meaning these flags will never be set.

More generally, be careful about assuming anything about PortAudio. If you think PortAudio is some kind of trustworthy, predictable building block that you can rely on, you'll be sorely disappointed. If you don't believe me, go take a look at some of the PortAudio internals (for example the WASAPI internal implementation code). I guarantee you'll be horrified.

I know that audio is routed via the windows audio service when in shared mode, still there's no advantage in using 480 samples of buffer size, when the hardware and the low level driver is using 256.

Last time I checked (which admittedly was quite some time ago), the Windows Audio Engine itself uses 960-sample buffer sizes. That size is fixed and does not depend on the hardware nor drivers. Which is why there's no point in opening a shared mode stream with a buffer size of 256 samples.

And yes, there is an advantage of using larger buffers in userspace than in hardware drivers, because hardware drivers (and kernel space in general) have more control over scheduling, and can therefore support short deadlines more easily than userspace can. Keep in mind most userspace applications care much more about reliability (i.e. no glitches) than latency, so larger userspace buffers often make more sense.

Also with PortAudio WASPAI shared and 960 samples the the errors are gone. While with PortAudio DirectSound and 480 samples also there aren't any errors.

Be careful that PortAudio often adds its own buffers behind your back. For example, if you ask PortAudio to open a DirectSound stream with a buffer size of 480 samples, your application will indeed use 480-sample buffers, but behind your back PortAudio will use additional, larger buffers (called "host buffer") when interacting with DirectSound. How much additional buffering is inserted depends on a number of factors, such as which Host API you're using, what buffer size you chose, the value of the suggestedLatency option, whether you're opening an input stream, and output stream, or both, etc.. In practice it could be 2x-3x the buffer size you use on the application side. This can makes things difficult to reason about (PortAudio might not be using the buffer size you think it's using) and can invalidate comparisons.

For added fun, the PortAudio code that deals with computing these buffer sizes can have its own bugs (for example this one that I reported).

The only way to figure out what is actually happening behind the scenes in terms of buffers is to examine the PortAudio debug log output as well as the relevant PortAudio internal implementation code. It's not easy. PortAudio often tries to be all smart and clever, which can be a good thing, but it sures makes it a lot harder to understand what is actually happening.

Do you already have reported this issue to the PortAudio community?

Not yet, no.

Do you plan to do so?

Not in the short term, because I'm not actively working on FlexASIO right now. I would need to do a more thorough investigation, and in the end I would probably have to fix the PortAudio code myself (as the PortAudio project is very understaffed), and that takes time and effort.

It's difficult to find the motivation to do this considering that dealing with PortAudio code in its current state is not particularly fun. In fact, I've often wondered whether I should get rid of PortAudio entirely and write a new ASIO driver that wraps WASAPI directly (similar to ASIO2WASAPI), cutting out the middleman. I would suggest you consider that option, especially if you don't need cross-platform support.

I think the WASAPI backend of PortAudio is lacking behind the DirectSound, because of the test results (see above).

I think one can definitely make the case that PortAudio's DirectSound code is more stable than its WASAPI code. The PortAudio DirectSound code is very mature, while the WASAPI code is still being actively changed today but these changes are arguably not very well-tested because of lack of resources on the PortAudio side.

That said, even the DirectSound code is not exempt from bugs - for example see this other issue where the PortAudio DirectSound code gets completely stuck when using a small (but still reasonable) buffer size on the input side.

What is happening on your side if you use WDM-KS instead of WASAPI exclusive? Both go directly to the hardware.

In my experience, PortAudio's WDM-KS code is more reliable than the WASAPI code, probably because it's more mature and undergoes fewer changes. For example WDM-KS doesn't have the glitching issue with full-duplex. On the other hand, it seems easier to achieve lower latencies with WASAPI than WDM-KS.

Jul 24 '19 10:07 dechamps

Etiennne, thank you for all the valuable details. I will definitely program directly to WASAPI avoiding PortAudio in the long run.

Jul 24 '19 11:07 aholzinger

Update: It's the polling mode! Unfortunately if not using exclusive mode the PortAudio code switches to polling mode. When using event driven mode no more dropouts. Thanks again.

Jul 26 '19 11:07 aholzinger

FlexASIO FlexASIO copied to clipboard

WASAPI Exclusive full duplex glitching

FlexASIO
FlexASIO copied to clipboard