cpal
cpal copied to clipboard
Handling different audio data layouts (interleaved / non-interleaved)
Currently CPAL's stream data callback API allows the user to assume that audio data will be provided with channels interleaved in a single slice of memory. There are some issues with this:
- Not all hosts provide an option for emitting data in an interleaved layout. It's also unclear whether some hosts provide non-interleaved channels within contiguous memory. As a result, CPAL currently does the necessary conversion to and from interleaved for these hosts under the hood - a non-trivial cost for energy/peformance-sensitive applications.
- Different downstream applications prefer different layouts, and some hosts allow for specifying a desired layout. E.g. I believe CoreAudio provides an API for requesting interleaved/non-interleaved, though I don't believe there is any guarantee that either is supported, just that at least one of them will be.
We should update CPAL's API guided by the following broad goals:
- Support both interleaved and non-interleaved layouts, while keeping in mind the potential for other more obscure layouts in the future. For now at least we can be sure that each supported host provides at least interleaved and/or non-interleaved.
- Allow users to query if a layout is supported by a device.
- Allow for users to request a particular layout when building a stream.
- Allow for users to provide a callback or callbacks for handling data in different layouts depending on what is supported by the device.
I don't have a solid proposal in mind yet, though thought I'd open this so that we have a place to discuss.
Straw proposal:
pub struct Format {
pub channels: ChannelCount,
pub sample_rate: SampleRate,
pub data_type: SampleFormat,
pub interleaved: bool,
}
impl Data {
pub fn as_slice<T>(&self) -> Option<&[&[T]]> { ... }
}
I like this as a nice simple solution for Data and the build_input/output_stream_raw methods.
I wonder how we should handle this in the build_input/output_stream methods?
Perhaps rather than taking &[&[T]] and requiring the user must check the layout on every call, we take a similar approach to #119 and require specifying the layout as a type. I'm imagining |data: &Interleaved<T>| ... where Interleaved<T> derefs to &[T], and NonInterleaved<T> derefs to &[&[T]].
I don't think that materially impacts the amount of checking the user has to do in the deinterleaved case. It's low-friction to access channel indices up to the number you requested, which are guaranteed to be there if the create call succeeded. It's definitely a little nicer for the interleaved case to not have to sprinkle [0] everywhere, though, and embedding that flag into the type system might improve readability. Conversely, having to look up what those wrappers are, learn about their Deref impls, and import/qualify them at every use, increases friction, so I'm not sure.
What I'd really like is [&[T]; N], but of course we can't have that until const generics (unless there's a modest upper bound on the number of channels? Supporting crazy professional DAW setups with boatloads of channels is cool and might preclude this, but there's always the dynamic API...).
Has any additional conceptual progress been made on this? I'm looking to implement the functionality, so it make sense to do it as a PR.
I'd like something which has good ergonomics by default, but which doesn't force me to take on overhead (e.g. with ASIO I'm paying the interleaving cost in both directions, despite the data to/from the driver being contiguous).
To get ideal performance everywhere I'd need to provide 2 callbacks for each stream/direction (one that works optimally for interleaved/non-interleaved data).
To strike a balance between ergonomics and performance I think:
- cpal can accept registration of callbacks for both cases
- cpal must require registration for at least one case, and at runtime perform conversion only if the needed implementation is missing (perhaps indicating that it's happening, and/or letting the developer query the preferred format)
Benefits:
- Developers can choose convenience (write-once in a format convenient to the developer)
- Developers can choose performance (more work, but possible when it matters)
- Types remain simple and signatures are consistent between cases
- registering a callback for some format implies the developer is already aware of how
&[T]should be interpreted - a single function pointer can be updated when the stream is started which reduces branching
- ~~interleaving contiguous data produces interleaved data, and interleaving interleaved data produces contiguous data~~ (edit: I was mistaken on this)
- if we know we need to convert then the function pointer is to an internal function which interleaves/de-interleaves and then calls the user callback
- registering a callback for some format implies the developer is already aware of how
device
.build_input_stream(
&config,
// Non-Interleaved callback
Some(move |data: &[T], cb_info: &InputCallbackInfo| { /* ... */ }),
// Interleaved callback
Some(move |data: &[T], cb_info: &InputCallbackInfo| { /* ... */}),
// ... and so on
)
This version doesn't do anything to the buffers as provided by the driver. If you wanted nicer types or for the build_input_stream function to look cleaner you could create a builder which would add all those extra layers for you.