rodio Span-less rodio?

I have had some time and have been pouring it into rodio lately. Having now written a lot of span/parameter_change related code I have some opinions now :) I have paused progress on the parameters_changed() PR while working out if an alternative (see below) works better.

A lot of performance is being lost by dealing with spans in various sources. Lets take buffered as an example. Since we got to notify any consumer at the right moment of a sample rate/channel count changes we up having to keep track of a lot.

This is (just a part of) the Buffered::next() method I have been working on this morning:

if self.shared.samples_in_memory.len() < self.samples_index {
	let sample = self.shared.samples_in_memory[self.samples_index];

	if self.samples_index == self.next_parameter_change_at {
		let new_params = &self.shared.parameters[self.parameter_changes_index];
		self.sample_rate = new_params.sample_rate;
		self.channel_count = new_params.channel_count;
	}

	// sample after sample where flag a parameter_change
	if self.samples_index > self.next_parameter_change_at {
		self.next_parameter_change_at =
			if self.parameter_changes_index > self.shared.parameters.len() {
				usize::MAX
			} else {
				self.shared.parameters[self.parameter_changes_index].index
			};
	}

	self.samples_index += 1;
	return Some(sample);
}

Now if instead we got entirely rid of them *1 this (part of the) method would become trivial:

if self.shared.samples_in_memory.len() < self.samples_index {
	let sample = self.shared.samples_in_memory[self.samples_index];
	self.samples_index += 1;
	return Some(sample);
}

Alternative: no spans

Get rid of spans, source will no longer have member functions channels() & sample_rate() instead it gets set_channel_count() & set_sample_rate(). The consumer (outputstream) uses those to communicate the picked sample_rate to the edges of the audio tree.

Schematic example

         OutputStream @ sr1,chs 4
		        |
			  mixer
			/       \
		 low_pass    mixer
		  /           /  \_______
		mixer       queue        \
	    /   \         |       convertor
    Square  Noise  convertor      |
    @44.1/4  @44.1/4  |        decoder
	               decorder      @48.0/1
				    @ 96.1/2

Here OutputStream picks a sample rate and channel count, then forwards it too the edges. Generators like Square & Noise use that sample rate while between every variable sample rate edge and the tree a resampler is inserted. As an optimization the optimal target parameters might be picked from those available by inspecting the edges of the tree. For that we could add a functions preferred_sample_rate() & preferred_channel_count().

Types

We could introduce a new trait SourceEdge that is the current Source. A SourceEdge would have a member convertor() that transforms it into a Source. An alternative would be integrating the convertor into each decoder. This probably has some performance advantages.

Many decoders at the same sample rate

Given five decoders that all need to be mixed, four of them at a sample rate 44.1khz, one at 96.1khz and the output setting the samplerate for the tree to 48.0khz. Keep in mind tha in this new span less rodio the mixer does not do any conversions. In the current version of rodio we could build the tree such that we would mix the four matching decoders and only then resample. Given hifi resampling is performance intensive that would speed up things a lot.

I can imagine this is a commen scenario in game audio, the decoder at 96.1khz could be a microphone for example. The other decoders sound effects. Can we still optimize this? Maybe if we introduce subtrees each at a fixed sample_rate with conversions in between, as seen below. This is something the user would need to set up themselves.

                                     OutputStream @ sr1,chs 4
                                            |
                                          mixer
                                          /   \
                                    convertor  \
                                        |       \
       ------------------------------ mixer      \_____
       |           |          |          |             \
    convertor   convertor  convertor  convertor         \
       |           |          |          |            convertor
    decoder     decoder     decoder    decoder           |
     @44.1       @44.1       @44.1     @44.1          decoder
                                                       @96.1

Note we still need convertor's in between in case the decoders change their samplerate. Those could skip the actual resampling when the sample rate does match.

Re-negotiation

This is an advanced feature, I am describing it to see how far we can get regarding performance in a span-less rodio.

The edges of the tree could vote to renegotiate. They would increase a shared AtomicIsize if they want to re-negotiation and decrease it if the do not. This would be usefull in case there are mixed sources that move from differing sample rates to the same. The outcome of the vote would be determined by the nearest mixer which has a convertor after it (we might want to make that a seperate thing, converting_mixer or something like that).

Example

                                     OutputStream @ sr1,chs 4
                                            |
                                          mixer
                                          /   \
                                    convertor  \
                                        |       \
       ------------------------------ mixer      \_____
       |           |          |          |             \
    convertor   convertor  convertor  convertor         \
       |           |          |          |            convertor
    decoder     decoder     decoder    decoder           |
     @42.0       @22.0       @22.0     @96.1          decoder
     ->44.1      ->44.1      ->44.1    ->44.1          @96.1

Here the 4 decoders mixed left start at different sample rates. The first negotiation picks 22.0khz as target samplerate for the mixer.

Speed source needs an extra resampling step

An elegant thing about the current rodio version is that changing audio speed is simply changing the reported samplerate. So if an audio tree speeds up audio and then slows it down that only takes effect when resampling just before the OutputStream. That is quite optimal.

If we make the sample rate in the tree constant speed no longer works. This is not a large problem as we can adapt speed to work again by resampling during the speed step. A more refined speed source would work without resampling but using FFT to not increase/decrease pitch like the current does.

*1: @PetrGlad brought this up recently and I have seen the suggestion in a old comment by an earlier rodio maintainer. I dismissed it at the time as too big a change/impossible.

Mar 03 '25 11:03 yara-blue

When I read the title of this PR, I thought yes please as you know I am on the position that span constructs are fraught with peril, trying to guarantee something that is very hard to guarantee, then having the rest of the chain depend on it.

Thanks for this extensive write-up. It opens the mind to many more use-cases than simply playing a singular source or mixing two compatible sources.

In the same line of thought, to "choose" between alternatives, I'm also thinking: what are the use-cases we want to optimize for? On the one extreme, we may have "plug & play". Stick whatever you want into Rodio and it'll re-negotiate, do magic (perform some black box & opinionated resampling/mixing) and boom, it works. On the other extreme, we may have "the user in the driving seat". For the user to make sure that whatever he puts in, is decorated with the required resampling and/or Rodio being configured by pulling relevant set_foo_bar levers.

Without much of a market survey, my personal preference is towards the latter: for advanced scenarios, require the user to be in control. Keeps the code base simpler while catering to probably the most common use case of having compatible sources. And if not, the user should know what to do.

Mar 06 '25 09:03 roderickvd

Without much of a market survey

The best we have is the user stories (https://github.com/RustAudio/rodio/issues/626). I took another look at it, there are synthesizers/music apps/games and music players. Then there is bevy-audio. They where migrating away from rodio they may still be. Though right now bevy audio still depends on rodio. One of us should talk or inspect the bevy audio crate and find out what they need. I'll see if I can schedule a call or something with one of their maintainers.

Mar 06 '25 11:03 yara-blue

I just listed the bevy audio design doc in #626. It might work well as input. I have also contacted one of their maintainers to talk about rodio and bevy.

I am planning to wait with making a choice until they have answered.

Mar 06 '25 12:03 yara-blue

regarding game-dev. The dev could make sure every source is the same sample rate and a constant one. Then you would only need a resampler at the root of the tree. I feel like that should a reasonable ask for optimal performance. We could even introduce a convertor_unchecked() that just assumes the decoder has the right sample rate. That would get that last check out of the way.

Mar 06 '25 15:03 yara-blue

The dev could make sure every source is the same sample rate and a constant one. Then you would only need a resampler at the root of the tree. I feel like that should a reasonable ask for optimal performance.

💯 agreed.

Mar 06 '25 17:03 roderickvd

I've reached out to the bevy devs (see 168f9e2d37c60bcce836574bbb6d3a60204ed3fb). I want to give that a week until I continue on this. I'll report back what I learn.

Mar 07 '25 14:03 yara-blue

I need more time to digest this. One cannot negotiate with output stream after it is opened, so that side of the chain is fixed unless one would want to inspect the whole graph and find some consensus among the Sources. I certainly do not :) Especially since some parameters may not be known initially. As I suggested before I'd prefer at most one input converter per input and only as necessary, and run the whole graph at the output streams' frame rate. It did not occur to me that we can pool converters, yes that may save resources but complicates things. Especially if one would want to do this automatically somehow. But we can have some user guide for performance optimization instead.

Regarding buffer implementation, if one can turn adaptable sources into queues, it could be possible just to add another source with new parameters to the queue every time parameters change. I am not sure if that would be worth the effort...

If we have separate type for variable sources, and an adapter (UniformSampleSource) that exposes fixed frame-rate stream that may help to type check this. I do prefer (if possible) that all sources that can produce fixed frame to do so. In that case uniform sources may still have frame-rate and channel count/mapping getters but those will be expected to be constant, or, alternatively, a builder/constructor can set these parameters at construction time.

So I think for sample rate it is a good idea but I am not sure what to do with variable channel count (or channel mapping), which use cases we have for this? I imagine some compressed audio may have mono and stereo fragments. Will it always convert those to max number of channels? How do we know what max number would be? What about sources that dynamically combine several inputs? Say what should do a mixer that may have variable number of inputs so number of channels or channel mapping is not known at first?

I will re-read the ticket in case I missed something.

Mar 09 '25 17:03 PetrGlad

About that re-negotiation step. How exactly would it help? By removing resamplers altogether when the output frequency happen to match the input? Or for sample by switching to custom resamplers when some simplified algorithm is possible? E.g. I can imagine that resampling with simpler frequency ratios (.. 1/3, 1/2, 1=nop, 2, 3,...) can potentially be simpler than a generic a/b ratio. So having $N$ inputs with potentially differing sample rates, the savings in resampling that I can imagine would approximately be $\frac{E}{N}$ where $E$ is number of sources that can use shortcuts. @dvdsk Is this what you had in mind?

I would expect channel conversion would be cheap and I am not sure how to get rid of it dynamically changing. in the end We can only have as many channels as output stream supports, bur there can be sources that switch channels on the fly. E.g. path-bay like apps like one in this ticket.

Mar 16 '25 15:03 PetrGlad

About that re-negotiation step. How exactly would it help? By removing resamplers altogether when the output frequency happen to match the input?

By instead of resampling only at the incoming edges adding an extra resampling point, the mixer. An example:

we have a mixer has 4 incoming streams
all those streams have at some point the same sample-rate: A
the output has fixed sample-rate X

Without negotiation we just resample A to X at all the edges. With negotiation we leave the sample-rate A until it gets to the mixer there we first mix the samples and then resample. Instead of resampling four times we only resample once.

A more complex case would have:

4 incoming streams
3 at rate A
1 at rate B
output at rate X

Now we can resample B to A and after mixing A to X . Instead of resampling four times we only do so twice.

I would expect channel conversion would be cheap and I am not sure how to get rid of it dynamically changing.

I have not figured out what to do to about channel count changing. To get rid of spans (which is the goal making this all worth it) we have to get rid of it.

there can be sources that switch channels on the fly. E.g. path-bay like apps like one https://github.com/RustAudio/rodio/issues/484.

Damn that is a pitty. Maybe we can have fixed pipelines and users writing such an app would connect multiple such pipelines. They could have the middle pipeline have as much channels as they will eventually need.

I reached out to Bevy's audio team and the outcome was interesting, @PetrGlad I send you and Roderick a mail about that. Did you receive it? If not could you send your mail address to me? My mail address is opensource@. You will find the domain on my github profile.

Mar 23 '25 17:03 yara-blue

@dvdsk I am sorry for not answering lately, I have taken on more responsibilities than I could digest. So have to complete a project I promised first. Hopefully this won't last more than a couple of weeks more. I would be happy to help with any direction that will improve quality of audio libraries.

Mar 28 '25 19:03 PetrGlad

@dvdsk I am sorry for not answering lately

no worries! I just wanted to make sure you where included.

I have taken on more responsibilities than I could digest.

Happens to all of us, good luck with it hope it gets better soon.

I would be happy to help with any direction that will improve quality of audio libraries.

I'll make sure to keep you in the loop!

Mar 28 '25 19:03 yara-blue

@dvdsk What are your thoughts on the future of the span-situation in Rodio? The issues surrounding this have gone a bit stale recently but life with Rodio in a >2 channel application is suffering. Appreciate your time greatly, I'd just like to hear your perspective of how this fits into Rodio's plans vs other priorities within the project.

May 22 '25 17:05 will3942

What are your thoughts on the future of the span-situation in Rodio?

I think we mostly landed on a design where we resample at the edges to the audio graph. What we have not figured out is what to do about channel counts changing. Until we hash that out we can not get rid of spans. Getting rid of spans is needed to get Queue to (efficiently) play gap-less audio.

life with Rodio in a >2 channel application is suffering

I'm sorry to hear that, if someone has time I would be happy to mentor and/or review any design aimed to fix this. But be warned it will be a large effort (#706 is a partial attempt and gives a good estimate on the amount of work needed). A good start would be reading the history relating to this, that would be: #690, #694 and #712 (this issue). Additionally the discussion in PR #706.

I'd just like to hear your perspective of how this fits into Rodio's plans vs other priorities within the project.

Because of the major changes required to Rodio to truly fix this I got talking to other audio crate devs and some major consumers. The outcome of that is https://github.com/RustAudio/audio-ecosystem. Right now we are trying to figure out behind which audio crate to unify most effort. To that end we are building a kind of benchmark/demo application tot tax candidate audio crates. This is where my main priority is atm, as this I think will benefit rust-audio most in the long term. The rest of the team is working on requested new features, I do not think anyone currently has the bandwidth to tackle this issue.

May 22 '25 19:05 yara-blue