web-audio-api-rs icon indicating copy to clipboard operation
web-audio-api-rs copied to clipboard

Proof of concept of single-channel convolution engine

Open orottier opened this issue 3 years ago • 3 comments

Getting started with #21 , this is a extremely simple, computationally intensive, single channel convolution engine

~Try it out with speakers very low because the normalization is not implemented yet!!~ normalization is now implemented!

cargo run --release --example convolution

My CPU barely keeps up with the larger response buffer, but this sets a baseline for further improvement

orottier avatar Jun 28 '22 06:06 orottier

Didn't really check the implementation itself as I'm not very familiar with such frequency domain stuff.

I tried to run the example and it crashes each time when switching to small room I think:

Dry
Small room
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: InputValues(true, true)', src/node/convolver.rs:342:20

For information and for the record, a colleague (who is kind of sound engineer) pointed me to this C++ implementation which he considers both as being of good quality and very efficient http://www.angelofarina.it/Public/X-MCFX_convolver/, source code seems to be there too https://github.com/JB-Luke/X-MCFX/tree/master/x-mcfx-convolver

b-ma avatar Jul 01 '22 13:07 b-ma

Yeah, it's a good point to decide now how to continue.

The current implementation is a terribly inefficient, mostly correct, version of the overlap-save method of performing convolution in FFT space.

There's a few options:

  • merge this (after fixing your panic) with a disclaimer 'experimental - only use with short reverbs'
  • bind to a state of the art C++ implementation (thanks for the reference!) - this makes me a bit sad because it will no longer be a pure rust lib
  • implement a state of the art convolution engine ourselves - it will be a lot of fun and a lot of hard work. For example https://cse.hkust.edu.hk/mjg_lib/bibs/DPSu/DPSu.Files/Ga95.PDF feels within my range of expertise

We could do all three. I will think about it and maybe we get some insights from the WAC

orottier avatar Jul 01 '22 15:07 orottier

Or port the C++ code to Rust. It is about 5k lines of code..

orottier avatar Jul 01 '22 16:07 orottier

Hey @b-ma, it took me some time to continue with this. I had a look at both porting or binding existing C++ stuff, but it is not my range of expertise. Hence for now I opted to do some incremental improvement on the current implementation. Using Frequency Domain Delay Lines there is a nice performance improvement while not overcomplicating stuff. Could you check if this works on your end? Up next is: adding tests (trivial impulse response, comparing with IIR, etc) and then the next improvement: Nonuniform partition scheduling For literature I am looking at https://www.eecs.qmul.ac.uk/~josh/documents/2017/Jillings%20IEEE%20WASPAA%202017.pdf and https://github.com/vtolani95/convolution/blob/master/reverb.py

orottier avatar Sep 05 '22 10:09 orottier

Benchmark result:


bench_ctor
  Instructions:              868101 (+0.163150%)
  L1 Accesses:              1756540 (+0.099214%)
  L2 Accesses:                 7853 (+0.268131%)
  RAM Accesses:               10361 (-0.038591%)
  Estimated Cycles:         2158440 (+0.079101%)

bench_sine
  Instructions:             9094532 (+1.155017%)
  L1 Accesses:             13652001 (+1.488827%)
  L2 Accesses:                29539 (+0.247743%)
  RAM Accesses:               12511 (+0.015988%)
  Estimated Cycles:        14237581 (+1.429863%)

bench_sine_gain
  Instructions:             9607190 (+1.132989%)
  L1 Accesses:             14426913 (+1.441830%)
  L2 Accesses:                31445 (-1.642165%)
  RAM Accesses:               12663 (+0.031598%)
  Estimated Cycles:        15027343 (+1.366429%)

bench_sine_gain_delay
  Instructions:            16683821 (+0.683987%)
  L1 Accesses:             24146109 (+0.891388%)
  L2 Accesses:                72024 (-3.073693%)
  RAM Accesses:               13853 (+0.021661%)
  Estimated Cycles:        24991084 (+0.814952%)

bench_buffer_src
  Instructions:            10935361 (+0.058560%)
  L1 Accesses:             18121362 (+0.051165%)
  L2 Accesses:                52896 (-3.825455%)
  RAM Accesses:               38663 (-0.025858%)
  Estimated Cycles:        19739047 (-0.008120%)

bench_buffer_src_iir
  Instructions:            21070468 (+0.039483%)
  L1 Accesses:             31513029 (+0.036506%)
  L2 Accesses:                53381 (-3.660055%)
  RAM Accesses:               38753 (-0.033535%)
  Estimated Cycles:        33136289 (+0.002731%)

bench_buffer_src_biquad
  Instructions:            15284320 (+0.101632%)
  L1 Accesses:             23717926 (+0.087411%)
  L2 Accesses:                74402 (-5.086173%)
  RAM Accesses:               38848 (-0.033453%)
  Estimated Cycles:        25449616 (+0.001273%)


github-actions[bot] avatar Sep 05 '22 10:09 github-actions[bot]

Hey, nice! I will a look tomorrow

Maybe this https://www.dspguide.com/ch18.htm could be of some help too, didn't check it yet and maybe a bit naive in terms of implementation but this book is generally really nice for explaining the concepts

b-ma avatar Sep 05 '22 14:09 b-ma

Benchmark result:


bench_ctor
  Instructions:              862422 (+0.073800%)
  L1 Accesses:              1747859 (+0.056043%)
  L2 Accesses:                 7840 (+0.076589%)
  RAM Accesses:               10332 (-0.318379%)
  Estimated Cycles:         2148679 (-0.006794%)

bench_sine
  Instructions:             9083928 (+1.093058%)
  L1 Accesses:             13637885 (+1.444348%)
  L2 Accesses:                29635 (+0.669203%)
  RAM Accesses:               12469 (-0.335705%)
  Estimated Cycles:        14222475 (+1.380654%)

bench_sine_gain
  Instructions:             9592804 (+1.035059%)
  L1 Accesses:             14408302 (+1.367226%)
  L2 Accesses:                31855 (+0.381295%)
  RAM Accesses:               12619 (-0.331727%)
  Estimated Cycles:        15009242 (+1.305852%)

bench_sine_gain_delay
  Instructions:            16663730 (+0.593599%)
  L1 Accesses:             24119507 (+0.822468%)
  L2 Accesses:                73911 (-2.901997%)
  RAM Accesses:               13810 (-0.303205%)
  Estimated Cycles:        24972412 (+0.743266%)

bench_buffer_src
  Instructions:            10934538 (+0.005405%)
  L1 Accesses:             18122921 (+0.016076%)
  L2 Accesses:                53094 (-3.712301%)
  RAM Accesses:               38629 (-0.118940%)
  Estimated Cycles:        19740406 (-0.045227%)

bench_buffer_src_iir
  Instructions:            21067790 (+0.002967%)
  L1 Accesses:             31512586 (+0.009626%)
  L2 Accesses:                53343 (-3.857037%)
  RAM Accesses:               38722 (-0.116078%)
  Estimated Cycles:        33134571 (-0.027885%)

bench_buffer_src_biquad
  Instructions:            15273970 (+0.000609%)
  L1 Accesses:             23712109 (+0.016851%)
  L2 Accesses:                71971 (-4.921000%)
  RAM Accesses:               38837 (-0.066902%)
  Estimated Cycles:        25431259 (-0.061069%)


github-actions[bot] avatar Sep 06 '22 06:09 github-actions[bot]

Hey, didn't have time to have a look at the code, but all tests are passing on my side and the example do not crash anymore! Quite nice :)

b-ma avatar Sep 06 '22 18:09 b-ma

Benchmark result:


bench_ctor
  Instructions:              867269 (+0.067152%)
  L1 Accesses:              1755693 (+0.050946%)
  L2 Accesses:                 7842 (+0.127681%)
  RAM Accesses:               10327 (-0.366618%)
  Estimated Cycles:         2156348 (-0.017897%)

bench_sine
  Instructions:             9088697 (+1.090117%)
  L1 Accesses:             13645739 (+1.442276%)
  L2 Accesses:                29480 (+0.047512%)
  RAM Accesses:               12475 (-0.271804%)
  Estimated Cycles:        14229764 (+1.374174%)

bench_sine_gain
  Instructions:             9597521 (+1.031206%)
  L1 Accesses:             14415922 (+1.364547%)
  L2 Accesses:                31848 (-0.381608%)
  RAM Accesses:               12627 (-0.252785%)
  Estimated Cycles:        15017107 (+1.297382%)

bench_sine_gain_delay
  Instructions:            16668395 (+0.590894%)
  L1 Accesses:             24128345 (+0.817164%)
  L2 Accesses:                72601 (-2.297195%)
  RAM Accesses:               13819 (-0.223827%)
  Estimated Cycles:        24975015 (+0.750129%)

bench_buffer_src
  Instructions:            10929752 (+0.007476%)
  L1 Accesses:             18115523 (+0.019076%)
  L2 Accesses:                52767 (-4.060000%)
  RAM Accesses:               38631 (-0.106020%)
  Estimated Cycles:        19731443 (-0.046326%)

bench_buffer_src_iir
  Instructions:            21062988 (+0.003884%)
  L1 Accesses:             31504982 (+0.010885%)
  L2 Accesses:                53216 (-3.957841%)
  RAM Accesses:               38723 (-0.103191%)
  Estimated Cycles:        33126367 (-0.026968%)

bench_buffer_src_biquad
  Instructions:            15269353 (+0.003812%)
  L1 Accesses:             23704496 (+0.030886%)
  L2 Accesses:                72115 (-8.003674%)
  RAM Accesses:               38842 (-0.048892%)
  Estimated Cycles:        25424541 (-0.097119%)


github-actions[bot] avatar Sep 07 '22 06:09 github-actions[bot]

Good to hear. If the code looks good to you I intend to merge this version. I have created #220 for further improvements

orottier avatar Sep 07 '22 06:09 orottier

Except my small comment, seems pretty good to me. I didn't go into the details of the implementation (I'm interested but...) and the tests look pretty nice!

Seems that going to multichannel from here wouldn't be too complicated, no?

Also there was a bench for the convolution in the original Paul's benchmarks, we could add it too to have an idea of where we get (I can port it if you don't want to make you hand dirty with JS :)

b-ma avatar Sep 07 '22 11:09 b-ma

Seems that going to multichannel from here wouldn't be too complicated, no?

Indeed, just a bit of extra bookkeeping

Also there was a bench for the convolution in the original Paul's benchmarks, we could add it too to have an idea of where we get (I can port it if you don't want to make you hand dirty with JS :)

Good point, I can give it a try!

orottier avatar Sep 07 '22 15:09 orottier

Absolutely not related to the issue but, I'm happy it's out there and quite usable: https://www.npmjs.com/package/node-web-audio-api :)

(I will probably move the repo to my team's organization, so maybe I will need to re-invite you as collaborator)

b-ma avatar Sep 07 '22 16:09 b-ma

Absolutely not related to the issue but, I'm happy it's out there and quite usable: https://www.npmjs.com/package/node-web-audio-api :)

(I will probably move the repo to my team's organization, so maybe I will need to re-invite you as collaborator)

Really cool. Congrats on the milestone!

orottier avatar Sep 07 '22 17:09 orottier

Benchmark result:


bench_ctor
  Instructions:              862420 (+0.073568%)
  L1 Accesses:              1747854 (+0.055757%)
  L2 Accesses:                 7848 (+0.178708%)
  RAM Accesses:               10327 (-0.366618%)
  Estimated Cycles:         2148539 (-0.013310%)

bench_sine
  Instructions:             9083926 (+1.093036%)
  L1 Accesses:             13637996 (+1.445174%)
  L2 Accesses:                29514 (+0.258170%)
  RAM Accesses:               12477 (-0.271761%)
  Estimated Cycles:        14222261 (+1.379129%)

bench_sine_gain
  Instructions:             9592802 (+1.035038%)
  L1 Accesses:             14408556 (+1.369013%)
  L2 Accesses:                31589 (-0.456923%)
  RAM Accesses:               12629 (-0.252745%)
  Estimated Cycles:        15008516 (+1.300952%)

bench_sine_gain_delay
  Instructions:            16663728 (+0.593587%)
  L1 Accesses:             24121540 (+0.830966%)
  L2 Accesses:                71865 (-5.589858%)
  RAM Accesses:               13821 (-0.223794%)
  Estimated Cycles:        24964600 (+0.711751%)

bench_buffer_src
  Instructions:            10934536 (+0.005616%)
  L1 Accesses:             18123080 (+0.017141%)
  L2 Accesses:                52929 (-4.009793%)
  RAM Accesses:               38633 (-0.111180%)
  Estimated Cycles:        19739880 (-0.047870%)

bench_buffer_src_iir
  Instructions:            21067759 (+0.002758%)
  L1 Accesses:             31512705 (+0.009946%)
  L2 Accesses:                53188 (-4.132946%)
  RAM Accesses:               38728 (-0.095447%)
  Estimated Cycles:        33134125 (-0.029043%)

bench_buffer_src_biquad
  Instructions:            15273983 (+0.000694%)
  L1 Accesses:             23711616 (+0.014767%)
  L2 Accesses:                72479 (-4.248629%)
  RAM Accesses:               38841 (-0.056609%)
  Estimated Cycles:        25433446 (-0.052458%)


github-actions[bot] avatar Sep 07 '22 18:09 github-actions[bot]

Ready for final comments

orottier avatar Sep 07 '22 18:09 orottier

Benchmark result:


bench_ctor
  Instructions:              867269 (+0.067152%)
  L1 Accesses:              1755693 (+0.050946%)
  L2 Accesses:                 7842 (+0.127681%)
  RAM Accesses:               10327 (-0.366618%)
  Estimated Cycles:         2156348 (-0.017897%)

bench_sine
  Instructions:             9088697 (+1.090117%)
  L1 Accesses:             13645739 (+1.442276%)
  L2 Accesses:                29480 (+0.047512%)
  RAM Accesses:               12475 (-0.271804%)
  Estimated Cycles:        14229764 (+1.374174%)

bench_sine_gain
  Instructions:             9597521 (+1.031206%)
  L1 Accesses:             14415922 (+1.364547%)
  L2 Accesses:                31848 (-0.381608%)
  RAM Accesses:               12627 (-0.252785%)
  Estimated Cycles:        15017107 (+1.297382%)

bench_sine_gain_delay
  Instructions:            16668395 (+0.590894%)
  L1 Accesses:             24128345 (+0.817164%)
  L2 Accesses:                72601 (-2.297195%)
  RAM Accesses:               13819 (-0.223827%)
  Estimated Cycles:        24975015 (+0.750129%)

bench_buffer_src
  Instructions:            10929773 (+0.007430%)
  L1 Accesses:             18115543 (+0.019043%)
  L2 Accesses:                52766 (-4.063562%)
  RAM Accesses:               38633 (-0.103431%)
  Estimated Cycles:        19731528 (-0.046229%)

bench_buffer_src_iir
  Instructions:            21062978 (+0.003860%)
  L1 Accesses:             31504969 (+0.010863%)
  L2 Accesses:                53216 (-3.956107%)
  RAM Accesses:               38722 (-0.105771%)
  Estimated Cycles:        33126319 (-0.027080%)

bench_buffer_src_biquad
  Instructions:            15269350 (+0.003792%)
  L1 Accesses:             23704499 (+0.030898%)
  L2 Accesses:                72115 (-8.003674%)
  RAM Accesses:               38839 (-0.056612%)
  Estimated Cycles:        25424439 (-0.097520%)


github-actions[bot] avatar Sep 08 '22 15:09 github-actions[bot]

Seems we are good to go :) congrats!

b-ma avatar Sep 08 '22 16:09 b-ma