rodio replace current_span_length in favor of parameters

implements #694

currently draft, due to high number of non trivial changes this PR also aims to dramatically extend the test suite of rodio. Hopefully that will catch most bugs.

@roderickvd I could use some help fixing the decoders, they now just have todo!() as parameters_changed implementation. Let me know if you want to work on that. The branch is under the rodio repo so you can just pull it down and push your work to it.

status (ill edit this along the way)

[x] trivial sources implemented (empty, sine etc)
[x] normal sources implemented (skip etc)
[x] extra tests for normal sources
[ ] complex sources implemented (queue and friends)
[ ] extra tests for complex sources
[ ] decoder sources implemented
[ ] extra tests for decoder sources (unsure if needed?)

Feb 24 '25 15:02 yara-blue

Sure thing. Don't have much time this week, will look into it after.

Feb 25 '25 19:02 roderickvd

Note: I split of the NonZero change into its own PR, please review that first then ill rebase this on it.

Feb 28 '25 11:02 yara-blue

first benchmark data is in. Key takeaways:

resampler is 1.3x slower. (needs to be looked into)
high_pass is 3.8x faster :partying_face:
reverb is 1.5x slower, probably a bad buffered implementation. (needs to be investigated)

I'd not expect those changes to be honest. Lets hope we can speed things up otherwise we have some hard choices to make. Lets not dwell on it, I have spend literally zero time on optimization and I have not written this with performance in mind (yet).

Current master

Timer precision: 20 ns
conversions     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ from_sample                │               │               │               │         │
   ├─ f32       564.8 µs      │ 1.377 ms      │ 737 µs        │ 761.8 µs      │ 100     │ 100
   ├─ i16       254.5 µs      │ 360.5 µs      │ 255.9 µs      │ 257.5 µs      │ 100     │ 100
   ╰─ u16       380 µs        │ 479.2 µs      │ 382.2 µs      │ 383.9 µs      │ 100     │ 100

     Running benches/effects.rs (target/release/deps/effects-b27671fcc5098510)
Timer precision: 20 ns
effects         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ agc_enabled  3.681 ms      │ 4.98 ms       │ 3.704 ms      │ 3.74 ms       │ 100     │ 100
├─ amplify      382.1 µs      │ 419.6 µs      │ 382.2 µs      │ 384.5 µs      │ 100     │ 100
├─ fade_out     1.142 ms      │ 1.161 ms      │ 1.148 ms      │ 1.148 ms      │ 100     │ 100
├─ high_pass    6.656 ms      │ 7.304 ms      │ 6.675 ms      │ 6.685 ms      │ 100     │ 100
╰─ reverb       8.281 ms      │ 8.422 ms      │ 8.322 ms      │ 8.326 ms      │ 100     │ 100

     Running benches/pipeline.rs (target/release/deps/pipeline-bb11d51b2daeec9e)
Timer precision: 30 ns
pipeline  fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ long   26.06 ms      │ 27.09 ms      │ 26.15 ms      │ 26.19 ms      │ 100     │ 100
╰─ short  3.788 ms      │ 3.856 ms      │ 3.839 ms      │ 3.835 ms      │ 100     │ 100

     Running benches/resampler.rs (target/release/deps/resampler-9a89b0f001b3c0cb)
Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.762 ms      │ 2.991 ms      │ 2.799 ms      │ 2.803 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.739 ms      │ 3.65 ms       │ 2.761 ms      │ 2.769 ms      │ 100     │ 100
   ├─ 11025       3.216 ms      │ 3.324 ms      │ 3.234 ms      │ 3.235 ms      │ 100     │ 100
   ├─ 16000       3.801 ms      │ 3.95 ms       │ 3.829 ms      │ 3.83 ms       │ 100     │ 100
   ├─ 22050       4.628 ms      │ 4.816 ms      │ 4.795 ms      │ 4.774 ms      │ 100     │ 100
   ├─ 44100       2.769 ms      │ 2.901 ms      │ 2.792 ms      │ 2.793 ms      │ 100     │ 100
   ├─ 48000       7.135 ms      │ 7.357 ms      │ 7.23 ms       │ 7.224 ms      │ 100     │ 100
   ├─ 88200       10.82 ms      │ 11.01 ms      │ 10.9 ms       │ 10.9 ms       │ 100     │ 100
   ├─ 96000       12.2 ms       │ 12.62 ms      │ 12.35 ms      │ 12.36 ms      │ 100     │ 100
   ├─ 176400      20.32 ms      │ 20.92 ms      │ 20.52 ms      │ 20.55 ms      │ 100     │ 100
   ├─ 192000      22.71 ms      │ 23.12 ms      │ 22.88 ms      │ 22.89 ms      │ 100     │ 100
   ├─ 352800      38.86 ms      │ 39.6 ms       │ 38.92 ms      │ 38.94 ms      │ 100     │ 100
   ╰─ 384000      45 ms         │ 47.65 ms      │ 45.07 ms      │ 45.15 ms      │ 100     │ 100

This PR

Timer precision: 20 ns
conversions     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ from_sample                │               │               │               │         │
   ├─ f32       753.3 µs      │ 969 µs        │ 919.8 µs      │ 920.8 µs      │ 100     │ 100
   ├─ i16       255 µs        │ 373.2 µs      │ 255.3 µs      │ 258.4 µs      │ 100     │ 100
   ╰─ u16       380 µs        │ 390.1 µs      │ 380.2 µs      │ 381.7 µs      │ 100     │ 100

     Running benches/effects.rs (target/release/deps/effects-4d762cfcb96f6e43)
Timer precision: 20 ns
effects         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ agc_enabled  3.695 ms      │ 4.135 ms      │ 3.699 ms      │ 3.709 ms      │ 100     │ 100
├─ amplify      380 µs        │ 398.3 µs      │ 380.4 µs      │ 382.5 µs      │ 100     │ 100
├─ fade_out     1.141 ms      │ 1.224 ms      │ 1.142 ms      │ 1.146 ms      │ 100     │ 100
├─ high_pass    1.711 ms      │ 1.733 ms      │ 1.714 ms      │ 1.715 ms      │ 100     │ 100
╰─ reverb       12.65 ms      │ 13.03 ms      │ 12.68 ms      │ 12.69 ms      │ 100     │ 100

     Running benches/pipeline.rs (target/release/deps/pipeline-113220290e55aa6e)
Timer precision: 20 ns
pipeline  fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ long   30.83 ms      │ 34.99 ms      │ 30.87 ms      │ 30.92 ms      │ 100     │ 100
╰─ short  1.711 ms      │ 1.73 ms       │ 1.714 ms      │ 1.714 ms      │ 100     │ 100

     Running benches/resampler.rs (target/release/deps/resampler-6724b60d4573a4a8)
Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  3.819 ms      │ 4.011 ms      │ 3.83 ms       │ 3.833 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        3.196 ms      │ 3.71 ms       │ 3.584 ms      │ 3.552 ms      │ 100     │ 100
   ├─ 11025       3.244 ms      │ 6.276 ms      │ 3.63 ms       │ 3.651 ms      │ 100     │ 100
   ├─ 16000       4.125 ms      │ 4.529 ms      │ 4.479 ms      │ 4.474 ms      │ 100     │ 100
   ├─ 22050       4.709 ms      │ 4.764 ms      │ 4.721 ms      │ 4.726 ms      │ 100     │ 100
   ├─ 44100       3.816 ms      │ 3.936 ms      │ 3.832 ms      │ 3.869 ms      │ 100     │ 100
   ├─ 48000       8.168 ms      │ 8.327 ms      │ 8.277 ms      │ 8.275 ms      │ 100     │ 100
   ├─ 88200       11.43 ms      │ 12.03 ms      │ 11.49 ms      │ 11.56 ms      │ 100     │ 100
   ├─ 96000       13.98 ms      │ 14.23 ms      │ 14.16 ms      │ 14.17 ms      │ 100     │ 100
   ├─ 176400      21.14 ms      │ 22.38 ms      │ 21.29 ms      │ 21.35 ms      │ 100     │ 100
   ├─ 192000      25.14 ms      │ 25.84 ms      │ 25.43 ms      │ 25.47 ms      │ 100     │ 100
   ├─ 352800      41.62 ms      │ 43.17 ms      │ 42.3 ms       │ 42.25 ms      │ 100     │ 100
   ╰─ 384000      46.84 ms      │ 67.99 ms      │ 48.41 ms      │ 48.47 ms      │ 100     │ 100

Feb 28 '25 18:02 yara-blue

Semantics of channels, sample_rate & parameters_changed

Current situation: greedy

Semantics are best explained with an example:

next -> (sample at sample_rate S1 and channel count C1)
parameters_changed -> false
next -> (sample at sample_rate S1 and channel count C1)
parameters_changed -> true
next -> (sample at sample_rate S2 and channel count C2)
parameters_changed -> false
next -> (sample at sample_rate S2 and channel count C2)
parameters_changed -> false
next -> (sample at sample_rate S2 and channel count C2)

In words, before you call next you get that the parameters are going to change. Said differently you get whether the next sample has a different channel count or sample rate then the previous sample.

Pro:

Sources that need to handle channel count and sample rate changes are simpler.
Similar semantics to current_span_length, you know ahead of time when to change.

Cons:

Ever source that can cause sample_rate or channel_count to change gets rather complex, needing to know ahead of time if that is going to happen.
Seems slower to me because of all the buffering

Alternative: lazy

Semantics again explained with an example:

next -> (sample at sample_rate S1 and channel count C1)
parameters_changed -> false
next -> (sample at sample_rate S1 and channel count C1)
parameters_changed -> false
next -> (sample at sample_rate S2 and channel count C2)
parameters_changed -> true
next -> (sample at sample_rate S2 and channel count C2)
parameters_changed -> false
next -> (sample at sample_rate S2 and channel count C2)

In words, after you call next you learn whether the sample you just got has a different sample rate or channel count then the samples before.

Pro:

Ever source that can cause sample_rate or channel_count get simpler when they cause a change they can flip a bool, reset it after the next sample has been emitted (consumer calls next).
Similar semantics to current_span_length, you know ahead of time when to change.
Sources that need to handle channel count and sample rate changes are simpler.

Cons:

Larger difference to how things are now
I just wrote 2.5k lines with the approach above and I do not want to change them :cry:

Note that sources that need to know if the sample_rate or channel_count changes do not need to buffer across calls to next. They:

call next
call parameters_changed
realize they did
query the new parameters using channel_count and sample_rate
handle the sample together with the new parameters

Mar 01 '25 14:03 yara-blue

EDIT: might not do the thing below but go for the greedy approach anyway, see: https://github.com/RustAudio/rodio/pull/706#issuecomment-2694150147

changing semantics for lazy option above, todo list:

[x] buffered
[ ] channel_volume
[ ] crossfade
[ ] delay
[ ] done
[ ] empty
[ ] empty_callback
[ ] fadein
[ ] fadeout
[ ] from_factory
[ ] from_iter
[ ] linear_ramp
[ ] pausable
[x] peekable
[ ] position
[ ] repeat
[ ] skip_duration
[ ] skippable
[ ] stoppable
[ ] take_duration
[ ] take_samples
[x] take_span*, removed: impossible to implement with lazy*
[ ] uniform
[ ] queue

Mar 01 '25 16:03 yara-blue

yet another idea: have both styles of parameters_changed available (compiler can probably get rid of those not used as dead code?)

Mar 01 '25 18:03 yara-blue

Our linear resampler might be even broken. There are some comments that suggest it was not completely polished. It used numerator/divisor calculation which now may probably be more straightforward. I would leave this to a separate task. After this change is made.

Mar 02 '25 20:03 PetrGlad

Regarding that greedy/lazy discussion. I do not really see how flagging before next frame may cause problems. I would like to keep changes to in-between frames only still. Sources that change number parameters themselves have all information about the change and should have no problems flagging it. The ones that do not change parameters or do not depend on them can just relay call to parameters_changed. The sources that combine inputs should process whole frames and then check whether a change is needed. Can you give an example how this approach may affect to buffering? If whole frame has to be processed it should be read completely anyway, if samples are independent, just loop for number of samples and check then.

With that lazy option, flagging after reading next sample and releasing is depending on next() call is confusing. Next source would have to read the sample, keep it somewhere then read new parameters than maybe re-allocate its buffers and then put that sample as the first one... On other words it is not clear to me which problem it is supposed to solve.

Mar 02 '25 20:03 PetrGlad

I do not really see how flagging before next frame may cause problems.

It does not cause problems but it may require a more complex implementation. Honestly both approaches are complex and it seems I have made a mistake making it worse. Ill get back on that later. I have been designing #712 yesterday and want to try working on that a while. Then we could compare performance and code complexity. Getting back to your question, lets say you are the queue source:

you have to provide the correct parameters for the next sample
at any point the current source may end, you do not know when

Thus you need to peek one sample ahead so you know if your current sample is actually the last. This kind of needing to peek one ahead happens a lot, I wrote a PeekingSource. To handle a span of one sample PeekingSource needs to when it reads ahead track: parameter changes, sample rate and channel count.

You might have already seen the mistake, we can just forbid spans of length one. They make no sense anyway, why would you want 1 sample (at 44_100 samples a second!) to have a different channel count? So peeking is simple again, removing this complexity. Quite a few sources need to know if 'the end is coming' so this unneeded complexity was duplicated at a few places. That is what prompted me to look at alternatives.

I am still working on #712, I still think that is valuable and if its partially implemented we can compare its performance to this PR.

Mar 03 '25 12:03 yara-blue

replace current_span_length in favor of parameters_changed

Current master

This PR

Semantics of channels, sample_rate & parameters_changed

Current situation: greedy

Alternative: lazy