rodio icon indicating copy to clipboard operation
rodio copied to clipboard

span_length is the wrong abstraction

Open yara-blue opened this issue 11 months ago • 34 comments

When decoding audio you get chunks with constant sample rate and channel count. It seemed natural to extend that into rodio's audio pipeline. The consumer of audio gets the guarantee that the sample rate and channel count will not change for the next n-samples. The consumer can then deal with that like this:

let mut resampler = Resampler::new();
loop {
  resampler.set_sample_rate(source.sample_rate());
  for _ in 0..source.current_span_len() {
    resample.resample(source.next()?);
  }
}

This seems very efficient right? It is however no different then:

let mut resampler = Resampler::new();
loop {
  resampler.set_sample_rate(source.sample_rate());
  while !source.parameters_changed() {
    resample.resample(source.next()?);
  }
}

Especially if we let the compiler know that the parameters usually do not change using:

impl Source {
  #[cold]
  fn parameters_changed() -> bool;

  ..
}

Currently at least skipping, seeking and queue are non-trivial in rodio since we may not break our promised span_len. That span len is however never used except to create the boolean parameters_changed via a comparison to an increasing index. Replacing current_span_len by parameters_changed makes rodio code far simpeler. For example seek:

impl Source {
  fn seek(&mut self, some_params: some_type) {
    self.parameters_changed = true
    // and some code to guarantee frame alignment
   //  but nothing for span alignment since spans do not exist anymore!
    ..
  }

  fn parameters_changed() -> bool {
    self.parameters_changed
  }
}

All the complex code in queue can just be removed, it can simply forward parameters_changed until a source ends: then it just switches it to true. Right now it is has to play silence to fill the current span if a source ends early. So this change will not only make the code far far simpler it will improve audio latency!

Regarding breaking the Source API, an argument along the line of https://github.com/RustAudio/rodio/issues/690#issuecomment-2612265055 can be made. This would also replace #690.

I think this is a wonderful fix/improvement. I might be wrong. Looking forward to your input.

yara-blue avatar Jan 26 '25 10:01 yara-blue

Interesting idea.

Couple thoughts:

  • When would you set parameters_changed back to false? On the Sample after the Sample that changed parameters?
  • ChannelVolume / ChannelCountConverter / UniformSourceIterator may be hard to implement if there is no guarantee the parameters will not change between each 0..num_channels so perhaps the parameters should not change until all Samples have been emitted for the current num_channels (e.g. for a 6 channel source the parameters can only change every 6 channels). Or maybe we should move to typed streams like @PetrGlad suggested to avoid this (https://github.com/RustAudio/rodio/pull/493#issuecomment-2521189273)

will3942 avatar Jan 26 '25 18:01 will3942

I have to read it again with the fresh mind but even with this version the source should guarantee that whole frame (samples for all channels) is received, otherwise it invites some complicated logic again. Instead of boolean I'd consider a revision counter that is increased every time parameters change, this is usually my favorite in such situations. Or if it is a boolean it should be be reset every time target sink sees it's value. Even if spans are not visible, I guess the our decoder wrappers will still need to handle a lot of cases?

PetrGlad avatar Jan 26 '25 19:01 PetrGlad

// This may need some optimizations, but I thought of having a struct that includes all the stream parameters that can be queried from a source. Besides our existing ones lie channel count, and sample rate that struct can include channel mapping (say channel 4 is "rear left").

PetrGlad avatar Jan 26 '25 19:01 PetrGlad

ChannelVolume / ChannelCountConverter / UniformSourceIterator may be hard to implement if there is no guarantee the parameters will not change between each 0..num_channels

They currently already may, but they practically do not change in between frames.

perhaps the parameters should not change until all Samples have been emitted for the current num_channels (e.g. for a 6 channel source the parameters can only change every 6 channels).

It makes no sense to change sample-rate or channel count mid frame. I think that is a fine demand to add to the source API.

related

Do we demand sources only emit whole frames?

Its a balance,

  • if we do that the API becomes stricter, adding new sources will be harder. It also opens up a new source of bugs. emitting a half frame at the end will be noticeable through subtle bugs in other parts of rodio (channels switching around).
  • Leaving the API as is puts all the work on us, though I think only queue needs to handle this right? And it is relatively doable.

yara-blue avatar Jan 27 '25 10:01 yara-blue

Instead of boolean I'd consider a revision counter that is increased every time parameters change, this is usually my favorite in such situations. Or if it is a boolean it should be be reset every time target sink sees it's value.

my idea was to use the boolean as a flag. The consumer switches it back as soon as it reads it. There is no risk of race conditions since its all serial anyway. The consumer further more does not need to know if it missed a change, it just needs to know one or more have happened.

yara-blue avatar Jan 27 '25 11:01 yara-blue

@dvdsk What you mean by "stricter" there? Usually, better defined APIs are easier to integrate with. Did you mean those in-frame changes?

I think I like the idea, as long as that flag reading is cheap. And since it is all in the same thread, no additional sync is needed.

I thought of a scenario when the source is handed over somewhere else, but it looks like a boolean should still be sufficient there.

I would have read and reset of that boolean in the same function to avoid situation when a sink forgets to clear it. But I cannot think of a good name for it, since normally names like get_format_changed() imply that it is read only. It could be take_format_has_changed() but that looks verbose.

PetrGlad avatar Jan 28 '25 08:01 PetrGlad

Do we demand sources only emit whole frames?

Its a balance,

  • if we do that the API becomes stricter, adding new sources will be harder. It also opens up a new source of bugs. emitting a half frame at the end will be noticeable through subtle bugs in other parts of rodio (channels switching around).
  • Leaving the API as is puts all the work on us, though I think only queue needs to handle this right? And it is relatively doable.

I don't understand the first point around "it opens up a new source of bugs"? Can you clarify please? In my mind adding a strict enforced requirement that sources can only emit full frames removed the class of bugs that are being solved at the moment where channels are switching around?

I think only queue needs to handle this right?

I think we have just scratched the surface of the issues of channels switching/frames becoming mixed up. (e.g. it also needs solving in the resampler https://github.com/RustAudio/rodio/issues/252)

IMO enforcing only emitting full frames would remove a class of bugs along with making the code easier to maintain by always being aware of which channel/frame you are at. Perhaps that is me being naive but that is my understanding.

will3942 avatar Jan 28 '25 08:01 will3942

In my mind adding a strict enforced requirement that sources can only emit full frames removed the class of bugs that are being solved at the moment where channels are switching around?

I do not count on end users to remember every detail of the Source documentation, a mistake is easily made. So any requirement not encoded by the type system is easily broken without being noticed. Especially since the result of breaking this requirement will emerge far away from where it was broken (for example in ChannelVolume).

I think we have just scratched the surface of the issues of channels switching/frames becoming mixed up. (e.g. it also needs solving in the resampler https://github.com/RustAudio/rodio/issues/252)

I can tell you the pile of issues related to span length is huge, take a look at the new queue code in that PR... Its scares me how many special cases need to be checked. That code is not refactored and probably wont be merged since switching to span_changed will remove all those edge cases.

IMO enforcing only emitting full frames would remove a class of bugs along with making the code easier to maintain by always being aware of which channel/frame you are at. Perhaps that is me being naive but that is my understanding.

No your right there, its just me worrying about users messing up that requirement, I would like to see how hard it is to take care of it for them.

yara-blue avatar Jan 28 '25 09:01 yara-blue

/// Sorry, I thought the previous comment was lost so reposted it.

@dvdsk What do you mean by "stricter"? Avoiding those in-frame changes? The problem is that I would agree to relax rules for sources if the complexity would be only on the source side. Say, it would be a lot harder to emit whole frames than to check for changes on the consuming side. But to me it seems that this would simplify a little on one side and make it a lot more complicated on the other. Queue is not the only consumer we or users may have. All the filters/sinks would have to handle it. Do we have input sources/decoders that may be difficult to implement with whole-frame requirement?

I think I like the idea of having a flag for format changes. Since this check is in the same thread, it should be cheap. I thought of scenarios when a source is handed over somewhere else, but it looks like just a boolean is sufficient also in that case.

I would consider the getter to reset the flag immediately, with a method like take_format_has_changed(), or poll_format_has_changed() ("get" normally implies that it is read-only), so the name looks like it is reading from a message queue.

PetrGlad avatar Jan 28 '25 09:01 PetrGlad

@dvdsk What you mean by "stricter" there? Usually, better defined APIs are easier to integrate with.

The current API is stricter then the proposed because the current API guarantees you no changes in parameters for the next N frames. The proposed API makes no such guarantee and only lets you know if the parameters just changed. Missing that time guarantee its less strict in my eyes. You could build the new API on top of the old. You can not however guarantee the parameters will not change for N-samples knowing only if they just changed.

(well you could, you could buffer N samples we will probably do that in the hifi resampler. Thats not a problem though, it needs a buffer anyway.)

yara-blue avatar Jan 28 '25 09:01 yara-blue

I would have read and reset of that boolean in the same function to avoid situation when a sink forgets to clear it.

take_format_has_changed()

I like that idea!

yara-blue avatar Jan 28 '25 09:01 yara-blue

/// Sorry, I thought the previous comment was lost..

it was not :)

yara-blue avatar Jan 28 '25 09:01 yara-blue

So any requirement not encoded by the type system is easily broken without being noticed. Especially since the result of breaking this requirement will emerge far away from where it was broken (for example in ChannelVolume).

Well, yes usually it is a good idea to encode important constraints in a type system, but with this one can go only so far. The stricter it is the more complicated it becomes. In this case, I imagine, a good compromise could be is sending slices, one for frame, but then each channel count will require a different type... Maybe we can have separate types for aligned or unaligned sources. I think it is possible to have a wrapper that aligns frames. It can detect incomplete frame on format change and fill the remainder with some kind of default (like frame average or most recent known values). It could also emit a tracing event for diagnostics. An alternative implementation could collect and pass whole frames and discard incomplete ones. With this filter the alignment logic can be shared.

PetrGlad avatar Feb 01 '25 10:02 PetrGlad

Reading the comments above I think you guys may already be aware, but just to make it explicit: we should not rely on sources providing full frames. Frames could be variable length, contain malformed data to skip (if not assumed as silence or zero-hold), or be shorter at the end of a track. For robustness, I advocate to not make any assumptions about it.

The Symphonia decoders deal with this by allocating a full-frame buffer, but only returning a slice with the samples successfully decoded (buffer[..n_written]).

Working with frames instead of iterators can be a useful optimization, particularly to resample.

Now for a wild idea, we could offer choice:

  • work with iterators like today (and suggest to check for audio specification changes while iterating);
  • "take" the next sample one-by-one;
  • read the next buffer up to a number of bytes (like Read::read).

...all of them consuming from the same buffer maintained by the source.

Another wild idea to work with frames is to have an enum with variants like FullSize and Deviant. I have no idea this would actually be useful, but just to throw it out there.

roderickvd avatar Feb 03 '25 20:02 roderickvd

a good compromise could be is sending slices, one for frame, but then each channel count will require a different type

I think you meant arrays? slices are the same type but then we get into borrow hell.

Reading the comments above I think you guys may already be aware, but just to make it explicit: we should not rely on sources providing full frames.

It might be worthwhile to define what we are talking about, you seem to be talking about external sources? The discussion above is about the rodio Source API contract. There is a balance between a strict contract that makes it easy rodio parts to communicate and fit together and a looser one that makes it easy to add new external sources by implementing Source for them.

We have to ensure frames are complete somewhere given the OS needs a constant stream of interleaved samples. We can do that at the border between external sources and rodio's Source and uphold that within rodio or we can do it at the edge to the OS. I think its easier and more performant to do it at the incoming edge then within rodio. That is also the current behavior, I am strongly against changing that.

What seems undefined is if a source may end mid frame. It would be more consistent to disallow that (note I flipped on this, I was in favor of allowing mid frame ending). Though disallowing that makes extending rodio harder, that could be solved with some sort of convertor along the lines of your suggestion in #678.

yara-blue avatar Feb 04 '25 14:02 yara-blue

It might be worthwhile to define what we are talking about, you seem to be talking about external sources? The discussion above is about the rodio Source API contract. There is a balance between a strict contract that makes it easy rodio parts to communicate and fit together and a looser one that makes it easy to add new external sources by implementing Source for them.

Yes, maybe I was on a different train of thought. If so just tell me to take it elsewhere :wink:

We have to ensure frames are complete somewhere given the OS needs a constant stream of interleaved samples. We can do that at the border between external sources and rodio's Source and uphold that within rodio or we can do it at the edge to the OS. I think its easier and more performant to do it at the incoming edge then within rodio. That is also the current behavior, I am strongly against changing that.

Does Rubato require it or the linear resampler too? With Rubato I noticed that you opted for SincFixedIn and I was wondering why not change to SincFixedOut to alleviate that problem.

What seems undefined is if a source may end mid frame. It would be more consistent to disallow that (note I flipped on this, I was in favor of allowing mid frame ending). Though disallowing that makes extending rodio harder, that could be solved with some sort of convertor along the lines of your suggestion in #678.

When decoding there's simply no guarantee that the frame will be as full as indicated or expected. One reason indeed is ending mid-frame at the end of the stream (particularly when trimming in gapless mode). Another is malformed data. If Rodio makes strict assumptions about that today, then this could already be an issue today. Of course most sources are well-formed, and frames can be zero-padded. So while it may seldom rear its ugly head, for robustness I would advocate handling such cases.

roderickvd avatar Feb 04 '25 19:02 roderickvd

Does Rubato require it or the linear resampler too?

they both do, but its not really a problem. rubato ideally gets a large bunch of frames anyway so we will have to buffer/chunk it anyway.

With Rubato I noticed that you opted for SincFixedIn and I was wondering why not change to SincFixedOut to alleviate that problem.

We'll see what fits best, I'm leaving that PR until we get the major work done (saves me a lot of rebase effort).

Another is malformed data. If Rodio makes strict assumptions about that today, then this could already be an issue today. Of course most sources are well-formed, and frames can be zero-padded. So while it may seldom rear its ugly head, for robustness I would advocate handling such cases.

I agree, handling such cases is the best way forward. Zero padding incurs very little overhead. But are we sure the decoders do not do this for us already?

yara-blue avatar Feb 05 '25 22:02 yara-blue

Another is malformed data. If Rodio makes strict assumptions about that today, then this could already be an issue today. Of course most sources are well-formed, and frames can be zero-padded. So while it may seldom rear its ugly head, for robustness I would advocate handling such cases.

I agree, handling such cases is the best way forward. Zero padding incurs very little overhead. But are we sure the decoders do not do this for us already?

I know that Symphonia trims (not pads) the buffer.

So I haven't tested all of the waters yet in the Rodio codebase, but from where I am, and the four issues you created surrounding span_length, it seems that requiring a strictly fixed span length (unlike it being a hint) introduces a lot of complexity.

With seeking, I have to think hard if it's even possible to guarantee it without having the caller "reset" its span length and iterator assumptions.

roderickvd avatar Feb 06 '25 22:02 roderickvd

Another thing to think of is that when we check for parameter changes on every frame this may prevent some compiler optimizations. Maybe that is not super important though. In any case if we want better throughput it is possible to use buffers for processing data in bulk (say, with SIMD), that would increase delays of course. E.g. Rubato says that it supports SIMD.

PetrGlad avatar Feb 17 '25 12:02 PetrGlad

Another thing to think of is that when we check for parameter changes on every frame this may prevent some compiler optimizations.

I do not really think so. Maybe there is some loop unrolling but then that would need to be guarded by a branch because the current_span _length can have any value (as far as the compiler knows). On the other hand compilers are magic so maybe?

In any case if we want better throughput it is possible to use buffers

That would work at the exchange of latency, since we can not abort early.

Rubato says that it supports SIMD

The wip rubato backend I wrote takes a 'fixed' (dynamic) buffer size. It incurs some latency, I kept the buffer size to the minimum required (which was still a lot (2k samples I think?))

yara-blue avatar Feb 17 '25 12:02 yara-blue

That would work at the exchange of latency, since we can not abort early.

Yes, that is what I said. Also buffer can have some predictable size (say 32-64 samples).

2000 samples is about 45 ms at 44100? Or is this some kind of sliding window? What would be the delay introduced by the resampler then?

PetrGlad avatar Feb 17 '25 13:02 PetrGlad

2000 samples is about 45 ms at 44100? Or is this some kind of sliding window? What would be the delay introduced by the resampler then?

I just checked it, it is about 1k samples so that's about 27ms. Not ideal for a rhythm game. But a rhythm game could pre-resample their files to common sample rates and then skip resampling when its not needed.

yara-blue avatar Feb 17 '25 17:02 yara-blue

I was about to get started on implementing this however its worth it to think about the API a little.

The options I've investigated:

  • we replace current_span_len with a function returning whether a span has started, something like new_span_started or parameters_changed.
  • replace current_span_len, channels & sample_rate with current_params returning an option of a parameter struct. Nice from an API standpoint since its the impossible to forget re-checking channels and sample_rate. Probably does not perform well unless enough compiler magic applies.
  • remove current_span_len and let the consumer always check channels & sample_rate. Nice from an API standpoint, again the question is does it compile to something nice?

I'll thought about checking out the performance (seeing if the compiler can do its magic). As I was writing the perf tests I realized the latter two options wont scale. If we merge some version of #657 any consumer of Source now needs to track 3 continually changing outputs (channels, sample_rate and channel_mask).

This leaves us with the first option I like the name parameters_changed best atm. More ideas are very welcome. The bool returned will be true for one sample and then false again. The consumer only needs to check the return value every sample. When it is true one or more parameters has changed. A resampler might want to collect samples into a buffer as long as the buffer is not full and the parameters have not changed.

yara-blue avatar Feb 19 '25 14:02 yara-blue

I want to get started on this tomorrow, before I do we need to know one last thing. Also moving the discussion with @roderickvd from https://github.com/RustAudio/rodio/pull/697.

  • Do we require from sources that spans end at frame boundaries?

I am going with yes

We have discussed arguments for and against, I have the feeling that everyone is floating between yes or no. I think we just have to go with something and see what that means once we have implemented it. We can always, with some pain, change it.

I am planning to add a Check source that will check frame/span alignment. We could consider adding it automatically to mix during debug builds. That should take care of most instances of user mode sources messing up frame/span alignment.

yara-blue avatar Feb 22 '25 12:02 yara-blue

@dvdsk So the idea was to have the parameters getter return Option to indicate changes (return None if nothing has changed)? I thought of having a struct with parameters. I like the idea, but if is used we have to think how to pass these around. I would be wary if this would require heap allocation - this should happen in teal-time thread. If all these structs will have the same size then the parameters could just be copied. On positive side, if a filter does not change the stream parameters it can just pass the struct further. I am not sure if just getting a boolean flag is necessarily more efficient than checking that an Option is None.

If we have channel descriptions, how those will be passed around? If it is a variable-sized collection that is returned from source, the code will be equally inefficient in both cases. We may have some optimization by having some registry holding all the mappings and let sources only pass a handle around (this may be an integer). Or maybe use Arcs for that. I can imagine a channel map as, say 16 channels -> 16 speaker locations one-to-one map. That could be 4 bits per stream channel (e.g. 1 byte for stereo) up-to 8 bytes struct. So if we do not have too many speaker types and limit number of supported channels, this map can also be Copy.

// After 16 months without job I have recently found one, finally. So will likely spend some time on this project only on weekends.

PetrGlad avatar Feb 22 '25 12:02 PetrGlad

It is my default choice to have some conventions that the code may rely on inside the program (inside rodio) and use stricter checks and assertions on data that comes in from outside. The check source can be a normal one amplify(0.2).check_frames().

I will try submitting the samples conversion today.

PetrGlad avatar Feb 22 '25 12:02 PetrGlad

// After 16 months without job I have recently found one, finally. So will likely spend some time on this project only on weekends.

first of all congratulations on the new job! I am grateful for all the work you put into rodio. I completely get that you have way less time now. Use your weekends to recharge and have fun, that could still be working on rodio but it might very well be something different. Good luck!

@dvdsk So the idea was to have the parameters getter return Option to indicate changes (return None if nothing has changed)?

yes that is one of the options, however it makes getting a partially consumed source difficult. An alternative is to let the consumer watch for changes. Disadvantages:

  • you are doing 2 or 3 compares instead of checking a single boolean
  • you have to store the parameters so you can compare them

If we have channel descriptions, how those will be passed around?

no idea, that issue has not made progress (and that is not a problem!) but I want to take a future where its merged into account

If it is a variable-sized collection that is returned from source, the code will be equally inefficient in both cases.

Yes but a source could cache them more efficiently if it can rely on the boolean to signal change.

The check source can be a normal one amplify(0.2).check_frames()

I like that :) that's similar to iterators inspect. I still would like to include the check functionality in mix by default on debug builds. Better to catch things early. Since a user can add their own rodio source's (like filters & effects on other sources) simply guarding the incoming edge does not seem like enough.

yara-blue avatar Feb 23 '25 18:02 yara-blue

I will try submitting the samples conversion today.

Please take all the time you need. And if you rather drop it I can take it over? No need to finish that while also starting at a new job.

yara-blue avatar Feb 23 '25 18:02 yara-blue

Another thing to think of is that when we check for parameter changes on every frame this may prevent some compiler optimizations.

I do not really think so. Maybe there is some loop unrolling but then that would need to be guarded by a branch because the current_span _length can have any value (as far as the compiler knows). On the other hand compilers are magic so maybe?

In any case if we want better throughput it is possible to use buffers

That would work at the exchange of latency, since we can not abort early.

Rubato says that it supports SIMD

The wip rubato backend I wrote takes a 'fixed' (dynamic) buffer size. It incurs some latency, I kept the buffer size to the minimum required (which was still a lot (2k samples I think?))

w.r.t. this discussion of SIMD auto-vectorization, branching, and what not, I thought this article was very interesting: https://matklad.github.io/2023/04/09/can-you-trust-a-compiler-to-optimize-your-code.html.

Key takeaways are that to leverage compiler SIMD auto-magic, we need to:

  1. Process in chunks of fixed or at least predictable size
  2. Within that processing, do not have any branches
  3. Take care to not have short-circuiting evaluations by using bit-wise instead of logical operators: & instead of && etc.

roderickvd avatar Mar 06 '25 12:03 roderickvd

Key takeaways are that to leverage compiler SIMD auto-magic, we need to:

1. Process in _chunks_ of fixed or at least predictable size

2. Within that processing, do not have any branches

3. Take care to not have short-circuiting evaluations by using bit-wise instead of logical operators: `&` instead of `&&` etc.
  1. I wonder if its enough to have the cpal take the samples or if we should add optional buffering (to force chunking in the rest of the chain).

  2. this one would only really be possible when abolishing spans. I found it impossible to not have an if in a next call here and there when dealing with spans.

Only real way to find this out is to test it I think. We need span & spanless rodio and for each put a little effort in optimizations. Only then we can really make a choice.

yara-blue avatar Mar 06 '25 15:03 yara-blue