web-audio-api-rs
web-audio-api-rs copied to clipboard
Optimization in audio graph traversal
Some improvements could be made in graph.rs when rendering an audio quantum:
- [x] Store edges more efficiently (currently a single HashMap that needs to be iterated many times)
- [ ] Use a specialized container type for the nodes. e.g. https://crates.io/crates/intmap
- [x] Avoid the remove and insert calls of the currently processing node. That was necessary for borrow reasons. Wrap nodes in
Cellor equivalent - [x] Clear the input buffers when processing of a Node is done (we don't need them anymore and this way they can be reused)
- ~~When a Node has only one outgoing connection in the graph, its outputs can be moved instead of copied to that Node's inputs~~ this is not useful because the inputs are immutable anyway and need to be copied/mutated to outputs
- [ ] Decouple the graph-topology code from the audio-specific code
I have a branch ready for item 1 at feature/intmap-for-graph But I will have a look at the other items first before deciding if this is worth the hassle of an exotic dependency
I took a stab at Avoid the remove and insert calls of the currently processing node. @b-ma but it's very tricky. We will need a lot of unsafe code to work around it. Not sure if that would be beneficial for the project. I tried a safe, intermediate solution but it had no benefits: 084d8187f7307ab4ca
I will have another look at Use a specialized container type for the nodes now
I tried a safe, intermediate solution but it had no benefits: https://github.com/orottier/web-audio-api-rs/commit/084d8187f7307ab4ca96d5ca95749454aa380eb2
Just tested, no sign of improvement neither sorry...
I just wonder if we could not also try to bypass the HashMap altogether to use some kind of Vec<Option<Node>> for nodes and delegating to something like https://docs.rs/index-pool/latest/index_pool/ to manage the indexes. Then in the graph parsing we could just retrieve/reinsert the nodes like that let node = self.nodes.swap(index, None), or is it silly?
edit Actually I just misread the swap method, would probably need something just similar as you did...
I took a new look at intmap: 6b7f11f80a4fe4. A very slight performance increase but maybe it's just noise. Also Granular synthesis seems to regress. I would say, no merge.
Next up, create more flamecharts to look for other optimizations before spending time on this again.
I just re-tested using RefCell<Node> in the HashMap and managed to have it working this time (slowly understanding some stuff :). It's there https://github.com/orottier/web-audio-api-rs/compare/main...b-ma:web-audio-api-rs:test/graph-render and the perf improvements are quite good (better than chrome on several test cases :)
before
+ id | name | duration (ms) | Speedup vs. realtime | buffer.duration (s)
- 1 | Baseline (silence) | 26 | 4615.4x | 120
- 2 | Simple source test without resampling (Mono) | 41 | 2926.8x | 120
- 3 | Simple source test without resampling (Stereo) | 55 | 2181.8x | 120
- 4 | Simple source test without resampling (Stereo and positionnal) | 173 | 693.6x | 120
- 5 | Simple source test with resampling (Mono) | 82 | 1463.4x | 120
- 6 | Simple source test with resampling (Stereo) | 116 | 1034.5x | 120
- 7 | Simple source test with resampling (Stereo and positionnal) | 232 | 517.2x | 120
- 8 | Upmix without resampling (Mono -> Stereo) | 46 | 2608.7x | 120
- 9 | Downmix without resampling (Stereo -> Mono) | 44 | 2727.3x | 120
- 10 | Simple mixing (100x same buffer) - be careful w/ volume here! | 1755 | 17.1x | 30
- 11 | Simple mixing (100 different buffers) - be careful w/ volume here! | 1733 | 17.3x | 30
- 12 | Simple mixing with gains | 340 | 352.9x | 120
- 13 | Granular synthesis | 2662 | 2.8x | 7.5
- 14 | Synth (Sawtooth with Envelope) | 3442 | 34.9x | 120
- 15 | Synth (Sawtooth with gain - no automation) | 2778 | 43.2x | 120
- 16 | Synth (Sawtooth without gain) | 1681 | 71.4x | 120
- 17 | Substractive Synth | 423 | 283.7x | 120
- 18 | Stereo panning | 82 | 1463.4x | 120
- 19 | Stereo panning with automation | 82 | 1463.4x | 120
- 20 | Sawtooth with automation | 75 | 1600.0x | 120
- 21 | Stereo source with delay | 210 | 571.4x | 120
after
+ id | name | duration (ms) | Speedup vs. realtime | buffer.duration (s)
- 1 | Baseline (silence) | 21 | 5714.3x | 120
- 2 | Simple source test without resampling (Mono) | 30 | 4000.0x | 120
- 3 | Simple source test without resampling (Stereo) | 44 | 2727.3x | 120
- 4 | Simple source test without resampling (Stereo and positionnal) | 158 | 759.5x | 120
- 5 | Simple source test with resampling (Mono) | 75 | 1600.0x | 120
- 6 | Simple source test with resampling (Stereo) | 106 | 1132.1x | 120
- 7 | Simple source test with resampling (Stereo and positionnal) | 209 | 574.2x | 120
- 8 | Upmix without resampling (Mono -> Stereo) | 39 | 3076.9x | 120
- 9 | Downmix without resampling (Stereo -> Mono) | 35 | 3428.6x | 120
- 10 | Simple mixing (100x same buffer) - be careful w/ volume here! | 1599 | 18.8x | 30
- 11 | Simple mixing (100 different buffers) - be careful w/ volume here! | 1604 | 18.7x | 30
- 12 | Simple mixing with gains | 300 | 400.0x | 120
- 13 | Granular synthesis | 2347 | 3.2x | 7.5
- 14 | Synth (Sawtooth with Envelope) | 2899 | 41.4x | 120
- 15 | Synth (Sawtooth with gain - no automation) | 2212 | 54.2x | 120
- 16 | Synth (Sawtooth without gain) | 1332 | 90.1x | 120
- 17 | Substractive Synth | 414 | 289.9x | 120
- 18 | Stereo panning | 71 | 1690.1x | 120
- 19 | Stereo panning with automation | 73 | 1643.8x | 120
- 20 | Sawtooth with automation | 62 | 1935.5x | 120
- 21 | Stereo source with delay | 201 | 597.0x | 120
The downside is that I didn't manage to get rid of unsafe code in 2 places. It very localized and seems to be the same problem each time (i.e. returning a reference to the buffer in Graph::render() and AudioParamValues::get()) so maybe you would have an idea to handle that?
Amazing, I did not realize there was this much to gain still from the Graph::insert/remove stuff. Let's continue discussing at https://github.com/orottier/web-audio-api-rs/pull/199
I'm closing this issue because I think the leftover point are no longer really interesting, given the current implementation