SDL The SDL3 audio subsystem redesign!

trafficstars

This is a work in progress! (and this commit will probably get force-pushed over at some point).

This rips up the entire SDL audio subsystem! While we still feed the audio device from a separate thread, the audio callback into the app is now ~gone~ a totally optional alternative.

Now the app will bind an SDL_AudioStream to a given device and feed data to it. As many streams as one likes can be bound to a device; SDL will mix them all into a single buffer and feed the device from there.

So it not only does this function as a basic mixer, it also means that multiple device opens are handled seamlessly (so if you want to open the device for your game, but you also link to a library that provides VoIP and it wants to open the device separately, you don't have to worry about stepping on each other, or that the OS will fail to allow multiple opens of the same device, etc).

Here is a simple program that just opens a device, binds two streams to it, and plays them both at the same time, ending when the first stream is exhausted, and looping the other:

#include <SDL3/SDL.h>

int main(int argc, char **argv)
{
    SDL_Init(SDL_INIT_AUDIO);

    SDL_AudioFormat musicfmt, soundfmt;
    int musicchannels, musicfreq, soundchannels, soundfreq;
    Uint8 *musicbuf, *soundbuf;
    Uint32 musicbuflen, soundbuflen;
    SDL_LoadWAV("music.wav", &musicfmt, &musicchannels, &musicfreq, &musicbuf, &musicbuflen);
    SDL_LoadWAV("tink.wav", &soundfmt, &soundchannels, &soundfreq, &soundbuf, &soundbuflen);

    SDL_AudioDeviceID *devices = SDL_GetAudioOutputDevices(NULL);
    SDL_AudioDeviceID device = devices ? SDL_OpenAudioDevice(devices[0], musicfmt, musicchannels, musicfreq) : 0;
    SDL_free(devices);

    if (device) {
        SDL_AudioStream *musicstream = SDL_CreateAndBindAudioStream(device, musicfmt, musicchannels, musicfreq);
        SDL_AudioStream *soundstream = SDL_CreateAndBindAudioStream(device, soundfmt, soundchannels, soundfreq);

        SDL_PutAudioStreamData(musicstream, musicbuf, musicbuflen);
        SDL_free(musicbuf);

        while (SDL_GetAudioStreamAvailable(musicstream) > 0) {
            if (SDL_GetAudioStreamAvailable(soundstream) < soundbuflen) {
                SDL_PutAudioStreamData(soundstream, soundbuf, soundbuflen);
            }
            SDL_Delay(10);
        }

        SDL_DestroyAudioStream(musicstream);
        SDL_DestroyAudioStream(soundstream);
    }

    SDL_free(soundbuf);

    SDL_Quit();
    return 0;
}

There are many many other changes; the best plan is to find README-migration.md in the commit and read up on the differences. Notably: the commit deletes more code than it adds, so in many ways the new audio code is a simplication of the code and the API.

There is a lot to be done still, but this has been churning in my working copy for weeks now. Since this has finally gotten far enough that it can be made to work in the right conditions, I intend to work out of this PR and then squash it down before merging.

Some notable things to be done, still:

Right now I've updated the dummy, disk, and pulseaudio backends, just for testing purposes. Everything else will fail to build.
None of the test apps (except a hacked up loopwave) have been updated for the new API yet.
A proper test app to test multiple streams has not yet been written.
I'm still not thrilled with the device open semantics, still rethinking those.
Opening a default device does not work at the moment, because this is not yet hooked up to do anything.
SDL_GetDefaultAudioInfo is missing in action. I need to decide how to handle this, still. It will return!
I'd like to have the backends send notifications if the default device changes, and have SDL_audio.c manage hand-off of playback to new devices, instead of each backend that supports this having to manage it, but this work hasn't been started.
A single-header library that simulates the old audio callback, for apps that can't or won't migrate to a different paradigm, needs to be written (but I don't think it'll be too terrible to build).
sdl2-compat updates

So we have a ways to go here, but this is the basic idea I'm moving towards. I have to spend some time on SDL 2.28.0 and sdl12-compat next week, and then I'll be returning to this. Feedback is certainly welcome in the meantime!

Fixes #7379. Reference Issue #6889. Reference Issue #6632.

May 13 '23 04:05 icculus

Now the app will bind an SDL_AudioStream to a given device and feed data to it. As many streams as one likes can be bound to a device; SDL will mix them all into a single buffer and feed the device from there.

Would it be possible to allow mixing to be done by the audio drivers using this new API in order to take advantage of hardware capabilities?

May 13 '23 14:05 ccawley2011

Would it be possible to allow mixing to be done by the audio drivers using this new API in order to take advantage of hardware capabilities?

I'd have to look into what is available on various APIs, but my suspicion is that most don't offer this, we'd have to keep a separate buffer for each stream, and mixing isn't a high-overhead operation in general.

I'd be more inclined to add SIMD versions of SDL_MixAudioFormat instead.

May 13 '23 18:05 icculus

I'm wondering if maybe it was too aggressive to remove SDL_AudioSpec entirely. A struct with just format, channels and sample rate could be nice, and probably save some code changes.

May 13 '23 18:05 icculus

I think I might expose three extra functions in SDL_AudioStream:

SDL_LockAudioStream
SDL_UnlockAudioStream

(There's already a mutex, this just lets others use it explicitly.)

SDL_SetAudioStreamCallback

Register a function that runs at the start of a SDL_GetAudioStreamData call. The callback can take the chance to add more data to the stream, or query the current amount, etc.

The end result is you can have the SDL2 callback interface, if you want it, and you can have it for each bound stream.

Actual code added to SDL is minimal, the old interface can be implemented without a lot of drama, or latency, or an extra single-header library. If SDL_mixer were so-inclined, it could move each audio channel to a stream and still be able to do posteffects, on-demand decoding, etc.

May 13 '23 20:05 icculus

That sounds awesome. :)

May 13 '23 23:05 slouken

Hey, many thanks for all your work on SDL and apologies if this is not the right place to raise this.

One issue I've been running into when using SDL with SDL_mixer is that doing volume fades creates a popping sound because volume is only changed on chunk boundaries in a hard step fashion. This was not something that could be easily fixed in SDL_mixer as it was simply calling MixAudioFormat() with a single volume for the whole chunk. Do you see any way to make the new audio API more flexible so that smooth fades could be more easily implemented by devs and/or in SDL_mixer? There's an old discussion of this in SDL_mixer repo https://github.com/libsdl-org/SDL_mixer/issues/190 .

May 14 '23 05:05 mausimus

One issue I've been running into when using SDL with SDL_mixer is that doing volume fades creates a popping sound because volume is only changed on chunk boundaries in a hard step fashion.

Yeah, this is a legit bug in SDL_mixer, but the interpolation should happen there, I think.

https://github.com/libsdl-org/SDL_mixer/issues/190 is the right place to discuss this.

May 14 '23 14:05 icculus

A few more observations:

SDL_audio.h contains the comment \brief Access to the raw audio mixing buffer for the SDL library., which is what the old API provided rather than the new one.
Would it be possible to have default values for SDL_OpenAudioDevice()?
Am I correct in thinking that calling SDL_GetAudioStreamData() for a stream that's bound to a device is undefined behaviour? Is this something that should be guarded against?
Since the lower level mixing function supports setting the volume, is this something that can be set for streams for simpler use cases?
It would be nice to have a function that lists the formats, channel counts and sample rates supported natively by the device, so that it can be used in e.g. configuration dialogs.

and mixing isn't a high-overhead operation in general.

I suspect resampling and channel conversion are likely to be the key issues here (especially on CPUs without hardware floating point), since that needs to be done before mixing happens. That said, there doesn't seem to be anything major in the proposed API that prevents hardware mixing, so this might be something to investigate separately at a later date.

May 17 '23 12:05 ccawley2011

SDL_audio.h contains the comment \brief Access to the raw audio mixing buffer for the SDL library., which is what the old API provided rather than the new one.

Good catch, I'll fix that.

Would it be possible to have default values for SDL_OpenAudioDevice()?

This should accept a device ID of 0 to request the default device, but this isn't hooked up at the moment, and there are still some logistics to figure out. There are going to be some API changes here still, I think.

Am I correct in thinking that calling SDL_GetAudioStreamData() for a stream that's bound to a device is undefined behaviour? Is this something that should be guarded against?

It isn't really undefined so much as it will absolutely ruin your output. :) But it's thread-safe and won't crash the app or anything.

I've thought about this (and also refusing to let the app change the stream's output format when it's bound to an output device), but my thinking is that if you put your finger on a hot stove, eventually you'll figure out to not do that.

Since the lower level mixing function supports setting the volume, is this something that can be set for streams for simpler use cases?

I think I have a FIXME in there to consider this. I was not going to do this, since it opens up a world of One More Things people would like added until we just reimplement SDL_mixer, but with the callback plan, maybe we can avoid feature creep, so maybe I will add this.

It would be nice to have a function that lists the formats, channel counts and sample rates supported natively by the device, so that it can be used in e.g. configuration dialogs.

There's an SDL2 API that was lost in here, SDL_GetDefaultAudioInfo, which needs to be readded once I figure out the default device politics. For specific devices, the current format (which is our best guess at a preferred format when not opened), is already available. SDL has never listed all possible formats, and I don't think it's useful to do so...in many cases, you're just moving where data conversion happens if you try to pick a "native" format.

May 18 '23 09:05 icculus

Wishlist item: there should be a way to query if device permission is available, or forbidden, or pending user response, if this is something various platforms expose.

iOS and Android obviously do this for the microphone, but web browsers will forbid access to audio output, too, until the user has interacted with the page, and WinRT makes approval of WASAPI device opens async, presumably for situations where they want users to approve it.

Having a more formal way to deal with that in SDL apps would be nice. I don't know what, yet.

May 19 '23 04:05 icculus

SDL_AudioStream callbacks are in, and loopwave has been updated to use them for testing purposes, and the changes to move from SDL2 to SDL3 are pretty small with this approach. This is a good improvement.

May 29 '23 02:05 icculus

SDL_AudioSpec is back in, but just as a thing that holds format/channels/frequency. It actually tightens up a bunch of code, and its purpose is really clear now vs SDL2, so I'm happy with its return.

May 30 '23 05:05 icculus

So one stumbling point is that I wanted to remove device open/close, and just let people bind streams to devices, but this causes other problems (people will want to have a definite shutdown point where "closing" the device will stop all their sounds, but what do you do if something else also has streams bound to a device? If you want to pause the device, there isn't an easy button here beyond unbinding all your streams at once, etc).

So I guess we're going to keep an open/close API, and opening will return a new device ID, even though internally these fake devices will all just mix onto a single physical device, but the VoIP library's streams will be logically separated from the movie playback library's streams, and the app's own streams, and pausing a device will just stop mixing one logical group, and closing a device will just unbind that group from the device.

It adds a little internal complexity, but it seems like the right thing to do, and will be less confusing for app developers.

Jun 19 '23 01:06 icculus

Of course, now we have device ids that can be used with some APIs (SDL_BindAudioStream needs a logical device) and device ids that can be used with others (SDL_OpenAudioDevice needs a physical device) and some that can reasonably be used with both (SDL_GetAudioDeviceName)

Have to think on this more.

Jun 20 '23 05:06 icculus

Actually, this is probably fine. Binding a stream to a physical device will fail, which might be confusing, but everything else can be made to reasonably work, including opening a new logical device from an existing logical device (and might even be useful if you want to make a temporary logical grouping of streams).

Jun 20 '23 14:06 icculus

Ok, logical audio devices are in, here's the silly test program doing the two streams with music and sound, plus a second open of the same device (done by opening the logical device's id, so you don't have to keep the original physical device id around!), playing the sound at an offset, so you can hear them all mixing into a single buffer for the actual hardware:

#include <SDL3/SDL.h>

int main(int argc, char **argv)
{
    SDL_Init(SDL_INIT_AUDIO);

    Uint8 *musicbuf, *soundbuf;
    Uint32 musicbuflen, soundbuflen;
    SDL_AudioSpec musicspec, soundspec;
    SDL_LoadWAV("music.wav", &musicspec, &musicbuf, &musicbuflen);
    SDL_LoadWAV("tink.wav", &soundspec, &soundbuf, &soundbuflen);

    SDL_AudioDeviceID *devices = SDL_GetAudioOutputDevices(NULL);
    const SDL_AudioDeviceID device = devices ? SDL_OpenAudioDevice(devices[0], &musicspec) : 0;
    SDL_free(devices);
    if (device) {
        const SDL_AudioDeviceID device2 = SDL_OpenAudioDevice(device, &musicspec);
        SDL_AudioStream *musicstream = SDL_CreateAndBindAudioStream(device, &musicspec);
        SDL_AudioStream *soundstream = SDL_CreateAndBindAudioStream(device, &soundspec);
        SDL_AudioStream *soundstream2 = SDL_CreateAndBindAudioStream(device2, &soundspec);
        Uint64 nextsound = 0;

        SDL_PutAudioStreamData(musicstream, musicbuf, musicbuflen);
        SDL_free(musicbuf);

        while (SDL_GetAudioStreamAvailable(musicstream) > 0) {
            if (SDL_GetAudioStreamAvailable(soundstream) < soundbuflen) {
                SDL_PutAudioStreamData(soundstream, soundbuf, soundbuflen);
            }
            if (SDL_GetAudioStreamAvailable(soundstream2) == 0) {
                if (!nextsound) {
                    nextsound = SDL_GetTicks() + 1000;
                } else if (nextsound <= SDL_GetTicks()) {
                    SDL_PutAudioStreamData(soundstream2, soundbuf, soundbuflen);
                }
            }
            SDL_Delay(10);
        }

        SDL_DestroyAudioStream(musicstream);
        SDL_DestroyAudioStream(soundstream);
        SDL_DestroyAudioStream(soundstream2);

        SDL_CloseAudioDevice(device2);
        SDL_CloseAudioDevice(device);
    }

    SDL_free(soundbuf);

    SDL_Quit();
    return 0;
}

This is complexity the average app won't need directly; it's intended to make things work when some external library wants to open a device too, and doesn't coordinate with the app to share one.

But it's also kinda glorious.

Jun 21 '23 04:06 icculus

Latest commit still has some loose ends to tie up, but not only are most of the details for default devices back in place, SDL can now handle migrating playback to a new default device when the system default changes.

Before this was pretty much something we handled explicitly in the CoreAudio backend (and just asked PulseAudio to manage for us implicitly), but now for any backend, we can just scoot all the logical devices over to different physical hardware, change the format of the business end of their audio streams, and keep going.

The backend just needs to be able to tell us when a new default device was selected (user changed it in system controls, they plugged in headphones, etc), and the higher level does the rest!

Jun 23 '23 06:06 icculus

Nice!

Jun 23 '23 13:06 slouken

Rebased this to branch from the top of main, since I was getting behind.
Default device opens still migrate, as mentioned before, but now it's smart enough to manage a USB cable being yanked out without warning until the OS picks a new default device. Previously this was only smart enough to deal with the user choosing a new default while all involved hardware was still functioning.
SDL_GetDefaultAudioInfo() is removed; you can request default format info from SDL_GetAudioDeviceFormat(SDL_AUDIO_DEVICE_DEFAULT_OUTPUT, &spec) now, and since we no longer open devices with a name string and default device opens can quietly migrate to different physical hardware, it's better to show the user a name like "System default" than the specific current default device anyhow.

Jun 24 '23 02:06 icculus

sdl2-compat work is sitting in https://github.com/libsdl-org/sdl2-compat/pull/80, which was like climbing a mountain, but I've almost reached the summit now.

Jun 27 '23 01:06 icculus

So I'm reworking the Pipewire backend, and while this is proving to be a good test of the new system for backends that provide their own threads, I'm wondering if the Pipewire implementation is wrong. I suspect the API is meant to work like the new PulseAudio threading code, where you let it spin one thread to dispatch PulseAudio events, and then all your threads cooperate around that.

Right now it is spinning a thread for each device, which is not bad in itself and what we would do, but I'm wondering if each thread they spin is fighting for the same socket and event queue anyhow, and we should restructure this to match the new PulseAudio code.

As an added bonus, then it can use the standard SDL device thread.

Jun 27 '23 16:06 icculus

this is proving to be a good test of the new system for backends that provide their own threads

This part turned out to be awesome, btw. PipeWire (and CoreAudio, etc) run their own threads and inevitably end up sort of reproducing the normal SDL thread's code, and fail to pick up fixes and changes. They also tend to have to throw an extra audio stream in there to deal with conversion, etc.

Now, the PipeWire's thread code is this...

static void output_callback(void *data)
{
    SDL_OutputAudioThreadIterate((SDL_AudioDevice *)data);
}

...just a massive, massive win here over SDL2.

Jun 28 '23 17:06 icculus

Very nice! When originally writing the PipeWire code, trying to adapt it to the existing SDL2 audio thread system was proving difficult, so it made sense to use the native thread loop system, and even that wasn't entirely painless with the old audio infrastructure. If that's all that's needed now, hooray!

Jun 28 '23 18:06 Kontrabant

@slouken We definitely landed on Windows 7 and later for SDL3, right? We've got conditional code in the DirectSound backend to offer functionality that's only on Vista and later, and I'm just going to make that unconditional if so.

Jul 06 '23 21:07 icculus

(But if this is like the only thing that would keep us off WinXP, I'll keep it.)

Jul 06 '23 21:07 icculus

I think we're officially Windows 7 and later, but we haven't broken anything for XP yet.

Jul 06 '23 23:07 slouken

DirectSound took a lot longer than I expected, because I got medieval on SDL_immdevice.c, but that's working now. WASAPI will hopefully go more quickly, but it's also a lot more complex...but there's a ton of code devoted to managing switching between default devices and keeping AudioStreams around to buffer between the extra format differences that might occur in these cases, and all of that code is going to straight-up evaporate away.

Jul 12 '23 06:07 icculus

but there's a ton of code devoted to managing switching between default devices and keeping AudioStreams around to buffer between the extra format differences that might occur in these cases, and all of that code is going to straight-up evaporate away

...and in this way, it did simplify, but in others, it got worse.

WASAPI is a little touchy about threading in general; for example, it runs device change notification callbacks in its own thread, but will deadlock if you try to release a device handle before that callback returns, not to mention various COM violations...plus I was thinking about that person that tried to open a device on a background thread and how CoInitialize wasn't done there, etc.

I ended up writing a little thing that keeps a thread in the background--our WASAPI management thread--and various things that are sensitive to multithreading get proxied through there, so COM objects are all created, live, and die on the same thread. The usual mainloop of audio doesn't need to be proxied, but a lot of device add/remove/change stuff does, as does open and close of the devices.

It's interesting, and it got the job done, but it feels unobvious and fragile. I might rip it out and try again later.

Jul 15 '23 04:07 icculus

Hey, @isage, I'm updating the Vita audio code for the new SDL3 interfaces, and I have dumb questions:

In SDL3, the backend code has to do its waiting in VITAAUD_WaitDevice; VITAAUD_PlayDevice should not block...this didn't matter in SDL2, but in SDL3, WaitDevice doesn't hold a mutex whereas PlayDevice does, so it's important to get this changed.

For output, can we just do something like this for WaitDevice?

    while (sceAudioOutGetRestSample(device->hidden->port) > device->spec.size) {
        SDL_Delay(1);
    }

I'm not sure if sceAudioOutGetRestSample works like this, or if there is a specific amount of data (instead of "wait until I can write at least a full buffer") I should aim for here. Also, I don't know if the dumb SDL_Delay() is appropriate or if there's a better way to wait for this event.

AudioIn doesn't appear to have anything equivalent, but this should also block in the (new!) WaitCaptureDevice() interface, which works just like WaitDevice, but for input. I assume sceAudioInInput blocks, since it only returns a full buffer at a time. Is there a reasonable way to wait on this buffer to be filled, or should we just say "this buffer holds X milliseconds of audio, let's wait X-1 milliseconds before reading" ...?

Jul 19 '23 04:07 icculus

sceAudioOutGetRestSample returns either spec.samples or 0 (atleast for SCE_AUDIO_OUT_PORT_TYPE_BGM). And yeah, SDL_Delay seems like the only way. Yes, sceAudioInInput blocks, and there's seems to be no api to check if that buffer is filled.

Jul 19 '23 06:07 isage

SDL SDL copied to clipboard

The SDL3 audio subsystem redesign!

SDL
SDL copied to clipboard