web-audio-api-rs
web-audio-api-rs copied to clipboard
Proof of concept of single-channel convolution engine
Getting started with #21 , this is a extremely simple, computationally intensive, single channel convolution engine
~Try it out with speakers very low because the normalization is not implemented yet!!~ normalization is now implemented!
cargo run --release --example convolution
My CPU barely keeps up with the larger response buffer, but this sets a baseline for further improvement
Didn't really check the implementation itself as I'm not very familiar with such frequency domain stuff.
I tried to run the example and it crashes each time when switching to small room I think:
Dry
Small room
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: InputValues(true, true)', src/node/convolver.rs:342:20
For information and for the record, a colleague (who is kind of sound engineer) pointed me to this C++ implementation which he considers both as being of good quality and very efficient http://www.angelofarina.it/Public/X-MCFX_convolver/, source code seems to be there too https://github.com/JB-Luke/X-MCFX/tree/master/x-mcfx-convolver
Yeah, it's a good point to decide now how to continue.
The current implementation is a terribly inefficient, mostly correct, version of the overlap-save method of performing convolution in FFT space.
There's a few options:
- merge this (after fixing your
panic) with a disclaimer 'experimental - only use with short reverbs' - bind to a state of the art C++ implementation (thanks for the reference!) - this makes me a bit sad because it will no longer be a pure rust lib
- implement a state of the art convolution engine ourselves - it will be a lot of fun and a lot of hard work. For example https://cse.hkust.edu.hk/mjg_lib/bibs/DPSu/DPSu.Files/Ga95.PDF feels within my range of expertise
We could do all three. I will think about it and maybe we get some insights from the WAC
Or port the C++ code to Rust. It is about 5k lines of code..
Hey @b-ma, it took me some time to continue with this. I had a look at both porting or binding existing C++ stuff, but it is not my range of expertise. Hence for now I opted to do some incremental improvement on the current implementation. Using Frequency Domain Delay Lines there is a nice performance improvement while not overcomplicating stuff. Could you check if this works on your end? Up next is: adding tests (trivial impulse response, comparing with IIR, etc) and then the next improvement: Nonuniform partition scheduling For literature I am looking at https://www.eecs.qmul.ac.uk/~josh/documents/2017/Jillings%20IEEE%20WASPAA%202017.pdf and https://github.com/vtolani95/convolution/blob/master/reverb.py
Benchmark result:
bench_ctor
Instructions: 868101 (+0.163150%)
L1 Accesses: 1756540 (+0.099214%)
L2 Accesses: 7853 (+0.268131%)
RAM Accesses: 10361 (-0.038591%)
Estimated Cycles: 2158440 (+0.079101%)
bench_sine
Instructions: 9094532 (+1.155017%)
L1 Accesses: 13652001 (+1.488827%)
L2 Accesses: 29539 (+0.247743%)
RAM Accesses: 12511 (+0.015988%)
Estimated Cycles: 14237581 (+1.429863%)
bench_sine_gain
Instructions: 9607190 (+1.132989%)
L1 Accesses: 14426913 (+1.441830%)
L2 Accesses: 31445 (-1.642165%)
RAM Accesses: 12663 (+0.031598%)
Estimated Cycles: 15027343 (+1.366429%)
bench_sine_gain_delay
Instructions: 16683821 (+0.683987%)
L1 Accesses: 24146109 (+0.891388%)
L2 Accesses: 72024 (-3.073693%)
RAM Accesses: 13853 (+0.021661%)
Estimated Cycles: 24991084 (+0.814952%)
bench_buffer_src
Instructions: 10935361 (+0.058560%)
L1 Accesses: 18121362 (+0.051165%)
L2 Accesses: 52896 (-3.825455%)
RAM Accesses: 38663 (-0.025858%)
Estimated Cycles: 19739047 (-0.008120%)
bench_buffer_src_iir
Instructions: 21070468 (+0.039483%)
L1 Accesses: 31513029 (+0.036506%)
L2 Accesses: 53381 (-3.660055%)
RAM Accesses: 38753 (-0.033535%)
Estimated Cycles: 33136289 (+0.002731%)
bench_buffer_src_biquad
Instructions: 15284320 (+0.101632%)
L1 Accesses: 23717926 (+0.087411%)
L2 Accesses: 74402 (-5.086173%)
RAM Accesses: 38848 (-0.033453%)
Estimated Cycles: 25449616 (+0.001273%)
Hey, nice! I will a look tomorrow
Maybe this https://www.dspguide.com/ch18.htm could be of some help too, didn't check it yet and maybe a bit naive in terms of implementation but this book is generally really nice for explaining the concepts
Benchmark result:
bench_ctor
Instructions: 862422 (+0.073800%)
L1 Accesses: 1747859 (+0.056043%)
L2 Accesses: 7840 (+0.076589%)
RAM Accesses: 10332 (-0.318379%)
Estimated Cycles: 2148679 (-0.006794%)
bench_sine
Instructions: 9083928 (+1.093058%)
L1 Accesses: 13637885 (+1.444348%)
L2 Accesses: 29635 (+0.669203%)
RAM Accesses: 12469 (-0.335705%)
Estimated Cycles: 14222475 (+1.380654%)
bench_sine_gain
Instructions: 9592804 (+1.035059%)
L1 Accesses: 14408302 (+1.367226%)
L2 Accesses: 31855 (+0.381295%)
RAM Accesses: 12619 (-0.331727%)
Estimated Cycles: 15009242 (+1.305852%)
bench_sine_gain_delay
Instructions: 16663730 (+0.593599%)
L1 Accesses: 24119507 (+0.822468%)
L2 Accesses: 73911 (-2.901997%)
RAM Accesses: 13810 (-0.303205%)
Estimated Cycles: 24972412 (+0.743266%)
bench_buffer_src
Instructions: 10934538 (+0.005405%)
L1 Accesses: 18122921 (+0.016076%)
L2 Accesses: 53094 (-3.712301%)
RAM Accesses: 38629 (-0.118940%)
Estimated Cycles: 19740406 (-0.045227%)
bench_buffer_src_iir
Instructions: 21067790 (+0.002967%)
L1 Accesses: 31512586 (+0.009626%)
L2 Accesses: 53343 (-3.857037%)
RAM Accesses: 38722 (-0.116078%)
Estimated Cycles: 33134571 (-0.027885%)
bench_buffer_src_biquad
Instructions: 15273970 (+0.000609%)
L1 Accesses: 23712109 (+0.016851%)
L2 Accesses: 71971 (-4.921000%)
RAM Accesses: 38837 (-0.066902%)
Estimated Cycles: 25431259 (-0.061069%)
Hey, didn't have time to have a look at the code, but all tests are passing on my side and the example do not crash anymore! Quite nice :)
Benchmark result:
bench_ctor
Instructions: 867269 (+0.067152%)
L1 Accesses: 1755693 (+0.050946%)
L2 Accesses: 7842 (+0.127681%)
RAM Accesses: 10327 (-0.366618%)
Estimated Cycles: 2156348 (-0.017897%)
bench_sine
Instructions: 9088697 (+1.090117%)
L1 Accesses: 13645739 (+1.442276%)
L2 Accesses: 29480 (+0.047512%)
RAM Accesses: 12475 (-0.271804%)
Estimated Cycles: 14229764 (+1.374174%)
bench_sine_gain
Instructions: 9597521 (+1.031206%)
L1 Accesses: 14415922 (+1.364547%)
L2 Accesses: 31848 (-0.381608%)
RAM Accesses: 12627 (-0.252785%)
Estimated Cycles: 15017107 (+1.297382%)
bench_sine_gain_delay
Instructions: 16668395 (+0.590894%)
L1 Accesses: 24128345 (+0.817164%)
L2 Accesses: 72601 (-2.297195%)
RAM Accesses: 13819 (-0.223827%)
Estimated Cycles: 24975015 (+0.750129%)
bench_buffer_src
Instructions: 10929752 (+0.007476%)
L1 Accesses: 18115523 (+0.019076%)
L2 Accesses: 52767 (-4.060000%)
RAM Accesses: 38631 (-0.106020%)
Estimated Cycles: 19731443 (-0.046326%)
bench_buffer_src_iir
Instructions: 21062988 (+0.003884%)
L1 Accesses: 31504982 (+0.010885%)
L2 Accesses: 53216 (-3.957841%)
RAM Accesses: 38723 (-0.103191%)
Estimated Cycles: 33126367 (-0.026968%)
bench_buffer_src_biquad
Instructions: 15269353 (+0.003812%)
L1 Accesses: 23704496 (+0.030886%)
L2 Accesses: 72115 (-8.003674%)
RAM Accesses: 38842 (-0.048892%)
Estimated Cycles: 25424541 (-0.097119%)
Good to hear. If the code looks good to you I intend to merge this version. I have created #220 for further improvements
Except my small comment, seems pretty good to me. I didn't go into the details of the implementation (I'm interested but...) and the tests look pretty nice!
Seems that going to multichannel from here wouldn't be too complicated, no?
Also there was a bench for the convolution in the original Paul's benchmarks, we could add it too to have an idea of where we get (I can port it if you don't want to make you hand dirty with JS :)
Seems that going to multichannel from here wouldn't be too complicated, no?
Indeed, just a bit of extra bookkeeping
Also there was a bench for the convolution in the original Paul's benchmarks, we could add it too to have an idea of where we get (I can port it if you don't want to make you hand dirty with JS :)
Good point, I can give it a try!
Absolutely not related to the issue but, I'm happy it's out there and quite usable: https://www.npmjs.com/package/node-web-audio-api :)
(I will probably move the repo to my team's organization, so maybe I will need to re-invite you as collaborator)
Absolutely not related to the issue but, I'm happy it's out there and quite usable: https://www.npmjs.com/package/node-web-audio-api :)
(I will probably move the repo to my team's organization, so maybe I will need to re-invite you as collaborator)
Really cool. Congrats on the milestone!
Benchmark result:
bench_ctor
Instructions: 862420 (+0.073568%)
L1 Accesses: 1747854 (+0.055757%)
L2 Accesses: 7848 (+0.178708%)
RAM Accesses: 10327 (-0.366618%)
Estimated Cycles: 2148539 (-0.013310%)
bench_sine
Instructions: 9083926 (+1.093036%)
L1 Accesses: 13637996 (+1.445174%)
L2 Accesses: 29514 (+0.258170%)
RAM Accesses: 12477 (-0.271761%)
Estimated Cycles: 14222261 (+1.379129%)
bench_sine_gain
Instructions: 9592802 (+1.035038%)
L1 Accesses: 14408556 (+1.369013%)
L2 Accesses: 31589 (-0.456923%)
RAM Accesses: 12629 (-0.252745%)
Estimated Cycles: 15008516 (+1.300952%)
bench_sine_gain_delay
Instructions: 16663728 (+0.593587%)
L1 Accesses: 24121540 (+0.830966%)
L2 Accesses: 71865 (-5.589858%)
RAM Accesses: 13821 (-0.223794%)
Estimated Cycles: 24964600 (+0.711751%)
bench_buffer_src
Instructions: 10934536 (+0.005616%)
L1 Accesses: 18123080 (+0.017141%)
L2 Accesses: 52929 (-4.009793%)
RAM Accesses: 38633 (-0.111180%)
Estimated Cycles: 19739880 (-0.047870%)
bench_buffer_src_iir
Instructions: 21067759 (+0.002758%)
L1 Accesses: 31512705 (+0.009946%)
L2 Accesses: 53188 (-4.132946%)
RAM Accesses: 38728 (-0.095447%)
Estimated Cycles: 33134125 (-0.029043%)
bench_buffer_src_biquad
Instructions: 15273983 (+0.000694%)
L1 Accesses: 23711616 (+0.014767%)
L2 Accesses: 72479 (-4.248629%)
RAM Accesses: 38841 (-0.056609%)
Estimated Cycles: 25433446 (-0.052458%)
Ready for final comments
Benchmark result:
bench_ctor
Instructions: 867269 (+0.067152%)
L1 Accesses: 1755693 (+0.050946%)
L2 Accesses: 7842 (+0.127681%)
RAM Accesses: 10327 (-0.366618%)
Estimated Cycles: 2156348 (-0.017897%)
bench_sine
Instructions: 9088697 (+1.090117%)
L1 Accesses: 13645739 (+1.442276%)
L2 Accesses: 29480 (+0.047512%)
RAM Accesses: 12475 (-0.271804%)
Estimated Cycles: 14229764 (+1.374174%)
bench_sine_gain
Instructions: 9597521 (+1.031206%)
L1 Accesses: 14415922 (+1.364547%)
L2 Accesses: 31848 (-0.381608%)
RAM Accesses: 12627 (-0.252785%)
Estimated Cycles: 15017107 (+1.297382%)
bench_sine_gain_delay
Instructions: 16668395 (+0.590894%)
L1 Accesses: 24128345 (+0.817164%)
L2 Accesses: 72601 (-2.297195%)
RAM Accesses: 13819 (-0.223827%)
Estimated Cycles: 24975015 (+0.750129%)
bench_buffer_src
Instructions: 10929773 (+0.007430%)
L1 Accesses: 18115543 (+0.019043%)
L2 Accesses: 52766 (-4.063562%)
RAM Accesses: 38633 (-0.103431%)
Estimated Cycles: 19731528 (-0.046229%)
bench_buffer_src_iir
Instructions: 21062978 (+0.003860%)
L1 Accesses: 31504969 (+0.010863%)
L2 Accesses: 53216 (-3.956107%)
RAM Accesses: 38722 (-0.105771%)
Estimated Cycles: 33126319 (-0.027080%)
bench_buffer_src_biquad
Instructions: 15269350 (+0.003792%)
L1 Accesses: 23704499 (+0.030898%)
L2 Accesses: 72115 (-8.003674%)
RAM Accesses: 38839 (-0.056612%)
Estimated Cycles: 25424439 (-0.097520%)
Seems we are good to go :) congrats!