replace current_span_length in favor of parameters_changed
implements #694
currently draft, due to high number of non trivial changes this PR also aims to dramatically extend the test suite of rodio. Hopefully that will catch most bugs.
@roderickvd I could use some help fixing the decoders, they now just have todo!() as parameters_changed implementation. Let me know if you want to work on that. The branch is under the rodio repo so you can just pull it down and push your work to it.
status (ill edit this along the way)
- [x] trivial sources implemented (empty, sine etc)
- [x] normal sources implemented (skip etc)
- [x] extra tests for normal sources
- [ ] complex sources implemented (queue and friends)
- [ ] extra tests for complex sources
- [ ] decoder sources implemented
- [ ] extra tests for decoder sources (unsure if needed?)
Sure thing. Don't have much time this week, will look into it after.
Note: I split of the NonZero change into its own PR, please review that first then ill rebase this on it.
first benchmark data is in. Key takeaways:
- resampler is 1.3x slower. (needs to be looked into)
- high_pass is 3.8x faster :partying_face:
- reverb is 1.5x slower, probably a bad
bufferedimplementation. (needs to be investigated)
I'd not expect those changes to be honest. Lets hope we can speed things up otherwise we have some hard choices to make. Lets not dwell on it, I have spend literally zero time on optimization and I have not written this with performance in mind (yet).
Current master
Timer precision: 20 ns
conversions fastest │ slowest │ median │ mean │ samples │ iters
╰─ from_sample │ │ │ │ │
├─ f32 564.8 µs │ 1.377 ms │ 737 µs │ 761.8 µs │ 100 │ 100
├─ i16 254.5 µs │ 360.5 µs │ 255.9 µs │ 257.5 µs │ 100 │ 100
╰─ u16 380 µs │ 479.2 µs │ 382.2 µs │ 383.9 µs │ 100 │ 100
Running benches/effects.rs (target/release/deps/effects-b27671fcc5098510)
Timer precision: 20 ns
effects fastest │ slowest │ median │ mean │ samples │ iters
├─ agc_enabled 3.681 ms │ 4.98 ms │ 3.704 ms │ 3.74 ms │ 100 │ 100
├─ amplify 382.1 µs │ 419.6 µs │ 382.2 µs │ 384.5 µs │ 100 │ 100
├─ fade_out 1.142 ms │ 1.161 ms │ 1.148 ms │ 1.148 ms │ 100 │ 100
├─ high_pass 6.656 ms │ 7.304 ms │ 6.675 ms │ 6.685 ms │ 100 │ 100
╰─ reverb 8.281 ms │ 8.422 ms │ 8.322 ms │ 8.326 ms │ 100 │ 100
Running benches/pipeline.rs (target/release/deps/pipeline-bb11d51b2daeec9e)
Timer precision: 30 ns
pipeline fastest │ slowest │ median │ mean │ samples │ iters
├─ long 26.06 ms │ 27.09 ms │ 26.15 ms │ 26.19 ms │ 100 │ 100
╰─ short 3.788 ms │ 3.856 ms │ 3.839 ms │ 3.835 ms │ 100 │ 100
Running benches/resampler.rs (target/release/deps/resampler-9a89b0f001b3c0cb)
Timer precision: 20 ns
resampler fastest │ slowest │ median │ mean │ samples │ iters
├─ no_resampling 2.762 ms │ 2.991 ms │ 2.799 ms │ 2.803 ms │ 100 │ 100
╰─ resample_to │ │ │ │ │
├─ 8000 2.739 ms │ 3.65 ms │ 2.761 ms │ 2.769 ms │ 100 │ 100
├─ 11025 3.216 ms │ 3.324 ms │ 3.234 ms │ 3.235 ms │ 100 │ 100
├─ 16000 3.801 ms │ 3.95 ms │ 3.829 ms │ 3.83 ms │ 100 │ 100
├─ 22050 4.628 ms │ 4.816 ms │ 4.795 ms │ 4.774 ms │ 100 │ 100
├─ 44100 2.769 ms │ 2.901 ms │ 2.792 ms │ 2.793 ms │ 100 │ 100
├─ 48000 7.135 ms │ 7.357 ms │ 7.23 ms │ 7.224 ms │ 100 │ 100
├─ 88200 10.82 ms │ 11.01 ms │ 10.9 ms │ 10.9 ms │ 100 │ 100
├─ 96000 12.2 ms │ 12.62 ms │ 12.35 ms │ 12.36 ms │ 100 │ 100
├─ 176400 20.32 ms │ 20.92 ms │ 20.52 ms │ 20.55 ms │ 100 │ 100
├─ 192000 22.71 ms │ 23.12 ms │ 22.88 ms │ 22.89 ms │ 100 │ 100
├─ 352800 38.86 ms │ 39.6 ms │ 38.92 ms │ 38.94 ms │ 100 │ 100
╰─ 384000 45 ms │ 47.65 ms │ 45.07 ms │ 45.15 ms │ 100 │ 100
This PR
Timer precision: 20 ns
conversions fastest │ slowest │ median │ mean │ samples │ iters
╰─ from_sample │ │ │ │ │
├─ f32 753.3 µs │ 969 µs │ 919.8 µs │ 920.8 µs │ 100 │ 100
├─ i16 255 µs │ 373.2 µs │ 255.3 µs │ 258.4 µs │ 100 │ 100
╰─ u16 380 µs │ 390.1 µs │ 380.2 µs │ 381.7 µs │ 100 │ 100
Running benches/effects.rs (target/release/deps/effects-4d762cfcb96f6e43)
Timer precision: 20 ns
effects fastest │ slowest │ median │ mean │ samples │ iters
├─ agc_enabled 3.695 ms │ 4.135 ms │ 3.699 ms │ 3.709 ms │ 100 │ 100
├─ amplify 380 µs │ 398.3 µs │ 380.4 µs │ 382.5 µs │ 100 │ 100
├─ fade_out 1.141 ms │ 1.224 ms │ 1.142 ms │ 1.146 ms │ 100 │ 100
├─ high_pass 1.711 ms │ 1.733 ms │ 1.714 ms │ 1.715 ms │ 100 │ 100
╰─ reverb 12.65 ms │ 13.03 ms │ 12.68 ms │ 12.69 ms │ 100 │ 100
Running benches/pipeline.rs (target/release/deps/pipeline-113220290e55aa6e)
Timer precision: 20 ns
pipeline fastest │ slowest │ median │ mean │ samples │ iters
├─ long 30.83 ms │ 34.99 ms │ 30.87 ms │ 30.92 ms │ 100 │ 100
╰─ short 1.711 ms │ 1.73 ms │ 1.714 ms │ 1.714 ms │ 100 │ 100
Running benches/resampler.rs (target/release/deps/resampler-6724b60d4573a4a8)
Timer precision: 20 ns
resampler fastest │ slowest │ median │ mean │ samples │ iters
├─ no_resampling 3.819 ms │ 4.011 ms │ 3.83 ms │ 3.833 ms │ 100 │ 100
╰─ resample_to │ │ │ │ │
├─ 8000 3.196 ms │ 3.71 ms │ 3.584 ms │ 3.552 ms │ 100 │ 100
├─ 11025 3.244 ms │ 6.276 ms │ 3.63 ms │ 3.651 ms │ 100 │ 100
├─ 16000 4.125 ms │ 4.529 ms │ 4.479 ms │ 4.474 ms │ 100 │ 100
├─ 22050 4.709 ms │ 4.764 ms │ 4.721 ms │ 4.726 ms │ 100 │ 100
├─ 44100 3.816 ms │ 3.936 ms │ 3.832 ms │ 3.869 ms │ 100 │ 100
├─ 48000 8.168 ms │ 8.327 ms │ 8.277 ms │ 8.275 ms │ 100 │ 100
├─ 88200 11.43 ms │ 12.03 ms │ 11.49 ms │ 11.56 ms │ 100 │ 100
├─ 96000 13.98 ms │ 14.23 ms │ 14.16 ms │ 14.17 ms │ 100 │ 100
├─ 176400 21.14 ms │ 22.38 ms │ 21.29 ms │ 21.35 ms │ 100 │ 100
├─ 192000 25.14 ms │ 25.84 ms │ 25.43 ms │ 25.47 ms │ 100 │ 100
├─ 352800 41.62 ms │ 43.17 ms │ 42.3 ms │ 42.25 ms │ 100 │ 100
╰─ 384000 46.84 ms │ 67.99 ms │ 48.41 ms │ 48.47 ms │ 100 │ 100
Semantics of channels, sample_rate & parameters_changed
Current situation: greedy
Semantics are best explained with an example:
next-> (sample at sample_rate S1 and channel count C1)parameters_changed-> falsenext-> (sample at sample_rate S1 and channel count C1)parameters_changed-> truenext-> (sample at sample_rate S2 and channel count C2)parameters_changed-> falsenext-> (sample at sample_rate S2 and channel count C2)parameters_changed-> falsenext-> (sample at sample_rate S2 and channel count C2)
In words, before you call next you get that the parameters are going to change. Said differently you get whether the next sample has a different channel count or sample rate then the previous sample.
Pro:
- Sources that need to handle channel count and sample rate changes are simpler.
- Similar semantics to
current_span_length, you know ahead of time when to change.
Cons:
- Ever source that can cause sample_rate or channel_count to change gets rather complex, needing to know ahead of time if that is going to happen.
- Seems slower to me because of all the
buffering
Alternative: lazy
Semantics again explained with an example:
next-> (sample at sample_rate S1 and channel count C1)parameters_changed-> falsenext-> (sample at sample_rate S1 and channel count C1)parameters_changed-> falsenext-> (sample at sample_rate S2 and channel count C2)parameters_changed-> truenext-> (sample at sample_rate S2 and channel count C2)parameters_changed-> falsenext-> (sample at sample_rate S2 and channel count C2)
In words, after you call next you learn whether the sample you just got has a different sample rate or channel count then the samples before.
Pro:
- Ever source that can cause sample_rate or channel_count get simpler when they cause a change they can flip a
bool, reset it after the next sample has been emitted (consumer callsnext). - Similar semantics to
current_span_length, you know ahead of time when to change. - Sources that need to handle channel count and sample rate changes are simpler.
Cons:
- Larger difference to how things are now
- I just wrote 2.5k lines with the approach above and I do not want to change them :cry:
Note that sources that need to know if the sample_rate or channel_count changes do not need to buffer across calls to next. They:
- call next
- call parameters_changed
- realize they did
- query the new parameters using
channel_countandsample_rate - handle the sample together with the new parameters
EDIT: might not do the thing below but go for the greedy approach anyway, see: https://github.com/RustAudio/rodio/pull/706#issuecomment-2694150147
changing semantics for lazy option above, todo list:
- [x] buffered
- [ ] channel_volume
- [ ] crossfade
- [ ] delay
- [ ] done
- [ ] empty
- [ ] empty_callback
- [ ] fadein
- [ ] fadeout
- [ ] from_factory
- [ ] from_iter
- [ ] linear_ramp
- [ ] pausable
- [x] peekable
- [ ] position
- [ ] repeat
- [ ] skip_duration
- [ ] skippable
- [ ] stoppable
- [ ] take_duration
- [ ] take_samples
- [x] take_span*, removed: impossible to implement with lazy*
- [ ] uniform
- [ ] queue
yet another idea: have both styles of parameters_changed available (compiler can probably get rid of those not used as dead code?)
Our linear resampler might be even broken. There are some comments that suggest it was not completely polished. It used numerator/divisor calculation which now may probably be more straightforward. I would leave this to a separate task. After this change is made.
Regarding that greedy/lazy discussion. I do not really see how flagging before next frame may cause problems. I would like to keep changes to in-between frames only still. Sources that change number parameters themselves have all information about the change and should have no problems flagging it. The ones that do not change parameters or do not depend on them can just relay call to parameters_changed.
The sources that combine inputs should process whole frames and then check whether a change is needed.
Can you give an example how this approach may affect to buffering? If whole frame has to be processed it should be read completely anyway, if samples are independent, just loop for number of samples and check then.
With that lazy option, flagging after reading next sample and releasing is depending on next() call is confusing. Next source would have to read the sample, keep it somewhere then read new parameters than maybe re-allocate its buffers and then put that sample as the first one... On other words it is not clear to me which problem it is supposed to solve.
I do not really see how flagging before next frame may cause problems.
It does not cause problems but it may require a more complex implementation. Honestly both approaches are complex and it seems I have made a mistake making it worse. Ill get back on that later. I have been designing #712 yesterday and want to try working on that a while. Then we could compare performance and code complexity. Getting back to your question, lets say you are the queue source:
- you have to provide the correct parameters for the next sample
- at any point the current source may end, you do not know when
Thus you need to peek one sample ahead so you know if your current sample is actually the last. This kind of needing to peek one ahead happens a lot, I wrote a PeekingSource. To handle a span of one sample PeekingSource needs to when it reads ahead track: parameter changes, sample rate and channel count.
You might have already seen the mistake, we can just forbid spans of length one. They make no sense anyway, why would you want 1 sample (at 44_100 samples a second!) to have a different channel count? So peeking is simple again, removing this complexity. Quite a few sources need to know if 'the end is coming' so this unneeded complexity was duplicated at a few places. That is what prompted me to look at alternatives.
I am still working on #712, I still think that is valuable and if its partially implemented we can compare its performance to this PR.